The Undecidability of the Halting Problem
One of the foundational results of Computer Science is that there is no program which can reliably tell if another program will halt on a given input.
This page presents a formal proof of this fact.
The unsolvability of the halting problem means that we can't (for most programming languages) write programs that will soundly and completely automatically check a program for some property. E.g.:
Will this program ever output 1?
Will this program ever output a number between 0 and 100?
Will this program ever issue an instruction to launch the nuclear missles?
If we could solve any of these, we can also solve the halting problem by wrapping the program under test with a test to see if it does do the thing we are looking for and looping indefinitely if it does. Then if our halting problem solution says “halts” then we know that the program doesn't have the behaviour we are looking for.
The proof of undecidability is an example of a diagonalisation proof, where we prove that a solution cannot exist by assuming that a solution exists and using it on itself to construct a contradiction.
Vocabulary
Ways of Building Programs and Data
The following function symbols give us ways of building programs and data values.
true()
,false()
,loop()
are all basic programs that ignore their input and (a) returntrue
, (b) returnfalse
, (c) loop forever producing no answer.pair(x,y)
represents a pair of data itemsx
andy
. We will model programs taking multiple inputs by giving them inputs as pairs.duplicate(p)
is a program that duplicates its inputx
to a pairpair(x,x)
, and then executes as the programp
with that pair as input.if(p1, p2, p3)
is a program that runsp1
on the input, ifp1
returnstrue
then it runsp2
and ifp1
returnsfalse
then it runsp3
.
We will only need to assume this minimal set of function symbols for building programs to prove that the halting problem is undeciable. We do not say in our axiomatisation below that these are the only ways of building programs, only that the underlying model of computation must have at least these ways of constructing programs.
The Execution Predicate
We use the execution predicate we was when specifying properties of programs:
exec(program, input, output)
-- meaning that when we runprogram
oninput
the result isoutput
.
We do not distinguish between things that are program-like and things that are data-like. In particular, a program can take itself as an input. This flexibility of self reference will be crucial for stating the halting problem and proving that it is undecidable.
Axioms
Next we have axioms that tell us how each of the different kinds of program executes. We'll not actually need all of these axioms to complete the proof of the undecidability of the halting problem, but they serve to show how one can use logic to specify how programs execute.
These axioms are meant to establish some basic properties of computation that any realistic kind of programs ought to satisfy: we can output fixed values, duplicate data, not give answers, and make decisions.
The true
and false
programs
The program true()
outputs true()
for any input:
∀x. exec(true(), x, true())
and if it outputs anything, then that thing is equal to true()
:
∀x. ∀y. exec(true(), x, y) → y = true()
Similarly, false()
outputs false()
for any input:
∀x. exec(false(), x, false())
and if it outputs anything, then that thing is equal to false()
:
∀x. ∀y. exec(false(), x, y) → y = false()
The loop
program
The program loop
never outputs anything:
∀x. ∀y. ¬exec(loop(), x, y)
The duplicate
program
The program duplicate(p)
acts like I said above:
∀p. ∀x. ∀y. exec(p, pair(x, x), y) → exec(duplicate(p), x, y)
and this is the only way it acts:
∀p. ∀x. ∀y. exec(duplicate(p), x, y) → exec(p, pair(x, x), y)
The if
program
The program if(p1,p2,p3)
acts like I said above. We split into two cases, one for when the condition p1
returns true
:
∀p1. ∀p2. ∀p3. ∀x. ∀y. exec(p1, x, true()) → exec(p2, x, y) → exec(if(p1, p2, p3), x, y)
and one when the condition returns false
:
∀p1. ∀p2. ∀p3. ∀x. ∀y. exec(p1, x, false()) → exec(p3, x, y) → exec(if(p1, p2, p3), x, y)
We also need to specify that this is the only way that if
executes:
∀p1. ∀p2. ∀p3. ∀x. ∀y. exec(if(p1, p2, p3), x, y) → ((exec(p1, x, true()) ∧ exec(p2, x, y)) ∨ (exec(p1, x, false()) ∧ exec(p3, x, y)))
Explanation
Why do we need “both directions” for the axioms for duplicate
and if
? This is because we are going to have to reason backwards about program execution to answer questions like “if the output was y
, then what happened during the program?”.
What does it mean for a program to halt?
As we saw when Specifying Properties of Programs, we can use the exec
predicate to define what it means for a program to halt. A program prog
halts on an input x
if there exists an answer y
that executing prog
with input x
gives the output y
:
∃y. exec(prog, x, y)
What does it mean for a program to solve the halting problem?
To specify when we have a program that solves the halting problem, we use a predicate solution(p)
. Any solution must satisfy the following four properties, so we say that solution(p)
implies each one:
A solution must always say “true” or “false”:
∀p. solution(p) → (∀q. ∀x. exec(p, pair(q, x), true()) ∨ exec(p, pair(q, x), false()))
A solution must never say “true” and “false”:
∀p. solution(p) → (∀q. ∀x. exec(p, pair(q, x), true()) → exec(p, pair(q, x), false()) → F)
Any program that satisfies these two properties is a decision procedure: a program that decides whether or not the input is in some set. The next two properties define what set this is.
If a solution says “true” for a given
p
andx
, then executingp
onx
halts:∀p. solution(p) → (∀q. ∀x. exec(p, pair(q, x), true()) → (∃y. exec(q, x, y)))
If a solution says “false” for a given
p
andx
, then executingp
onx
does not halt (i.e. loops):∀p. solution(p) → (∀q. ∀x. exec(p, pair(q, x), false()) → ¬(∃y. exec(q, x, y)))
Some notes on this specification:
Any solution to the halting problem always halts.
If we didn't have this property, then there is an easy solution to the halting problem: run the program and if it halts then output
true
, otherwise never give an answer.We do not only say that if the solution says
true
then the input halts, we also state the opposite property forfalse
. This rules out “solutions” to the halting problem that always sayfalse
, or underestimate the cases when the input halts. In practice, because the halting problem is undecidable, this is in fact what we have to do. We'll discuss mitigations of undecidability at the end of this page.
What if we had a solution to the Halting Problem?
If we had a solution to the halting problem, then we could use it to make larger programs.
One program we can make is the following. Let's say we have a solution p
to the halting problem, then we could make the following program (in pseudocode):
define spoiler(x):
if(p(x,x)):
loop-forever
else:
return true
So this program takes an input x
and asks p
whether or not x
will halt when given itself as an input. If p
says “true”, then it loops forever; if p
says “false”, then it returns “true”. We'll prove these facts formally from the axioms below.
We call this program spoiler
because we can use it to prove that there cannot be any solution to the halting problem.
We can write this program in the format described above like so:
if(duplicate(p), loop(), true())
Solution says “loops”, then spoiler halts
Conversely, we can prove from the axioms that, if the solution p
says “false” then the spoiler program does halt:
The proof goes like this:
We assume that
p
saysfalse()
when presented with programx
and inputx
. Ifp
is a solution to the halting problem, then this means thatx
does not halt on the inputx
.Using the axioms for
if
,duplicate
andtrue
, we can then replay our informal reasoning above that the spoiler program does halt, with outputtrue()
.
Solution says “halts”, then spoiler loops
We can prove formally from the axioms that, if the solution p
says “true” then the spoiler program does not halt:
The proof goes like this:
We assume that
p
saystrue()
when presented with the programx
and inputx
. Ifp
is a solution to the halting problem, then this means thatx
does halt on the inputx
.Using the reverse axioms for
if
,duplicate
, andloop
, we can again replicate our informal reasoning to show that the spoiler program does not halt.
Undecidability of the Halting Problem
Given these two facts we have proved about the spoiler program, we can now prove that there can be no solution to the halting problem. The proof goes like this:
To prove that there cannot be a solution, we assume that there is a solution
p
and proveF
(“false” as a logical proposition) to show that there is a contradiction.Since
p
is a solution it must either say “true” or “false” when given the spoiler program as both the program and its input. This means the proof splits into two cases:If
p
says “true”, then we know that:The spoiler program must halt when given itself as an input, because
p
is a solution to the halting problem; andThe spoiler program must not halt when given itself as an input, because
p
said “true” so we can use the first result above.
We cannot have that a program both halts and does not halt, so we can prove
F
.If
p
says “false”, then we know that:The spoiler program must not halt when given itself as an input, because
p
is a solution to the halting problem; andThe spoiler program must halt when given itself as an input, because
p
said “false” so we can use the second result above.
We cannot have that a program both halts and does not halt, so we can prove
F
.
We have proved
F
in both branches of the proof, so we have a contradiction. The only assumption we made (apart from our basic assumptions about programs) is that there is a solution to the halting problem. So it must be impossible for such a solution to exist.
We can carry this proof out formally from our axioms of computation and the two results we proved above:
What does this proof show?
The undecidability of the halting problem is in some sense disappointing, because it shows that we cannot build a program that will do perfect checking of other programs for us. On the other hand, it also shows that Computer Science can never be “solved” in some sense. There will always be more programs to think about.
It is worth looking at the various assumptions underlying this proof, to see exactly what it is saying:
We assumed a model of computation that supports if-then-else, duplication, and looping, because these are the things we needed to build the spoiler.
Prohibiting if-then-else seems like it would be very restrictive, so this is rarely done.
The real danger here is duplication, because it allows us to construct large inputs from small inputs, meaning that there is no bound on the size of computations. If we restrict the size of computations to some fixed size, then the halting problem becomes solvable by enumerating all possible states of the computation. This does require that we have a machine much larger than the ones we want to simulate, however.
Another solution is to prohibit unrestricted looping altogether, or at least to control it in some way. If we restrict our programs to always only loop over the input, or over data structures generated from the input, then we can guarantee termination. Unfortunately, this also means that we miss some functions (because otherwise we would have a solution to the halting problem!).
We assumed that a solution is sound and complete. Soundness means that if it says
true
then the program halts. Completeness means that if it saysfalse
, then the program does not halt. If we drop one of these, then we can make useful approximate solutions. For example, a solution that saystrue
in most useful cases is an area of intensive research. This is similar to the idea above of restricting programs to a certain form, but approaching it from the other direction.
On the next page, we'll look at a similar negative result that it purely about logic: Gödel's Incompleteness Theorem.