Specification and Verification

One of the motivating reasons to use logic in Computer Science is to specify what software systems are supposed to do, and to verify that they actually do so. But first, we have to learn where specification and verification fit into the overall picture of software development and how formal logic may help.

Validation and Verification

The problem of building a software system that is fit for purpose involves two key questions:

Validation: Are we building the right thing?
Verification: Are we building the thing right?

Validation is the question of whether or not the system that is being built is fit for the purpose that it is intended for. This question is answered by working with the “stakeholders” of the system, who include the actual users, as well as whoever is paying for the system, the people whose data is being stored (the “data subjects”), the people responsible for deploying and maintaining the system, relevant regulatory authorities (e.g, the ICO), and others. One might include “physical reality” as a stakeholder too.

Verification is the question of whether or not the system built actually performs as required. This relates the code and hardware to the requirements identified by stakeholders.

The interface between these two questions is the specification of the system. The specification is the description of what the system is meant to do. Validation checks the specification against the real world, and verification checks the implementation against the specification.

In actual software systems, it is very rare for a complete specification to be written down. Some parts may be written down (e.g., “must run on iOS and Android”, “must interface with the PostgreSQL server”, “must store data X, Y, Z”), some might be legal requirements (e.g., compliance with GDPR), some might be “common sense” (e.g., people's surnames don't contain spaces and other false things), and some might be difficult to pin down (e.g., the game must be fun).

The process of coming up with a specification is not a one-off event. Software systems must adapt to changing requirements, and the presence of a software system itself changes the reality around it. So the processes of validation and verification are typically on going.

Software systems are not monolithic entities. A single system (e.g., the Pegasus system used at Strathclyde for student and staff records) will itself be composed of multiple sub-systems (e.g., payroll, admissions, curriculum history, exam boards, course catalogues, ...), and each of those will be built from components (e.g., databases, web frontend frameworks, message queues, reporting tools, ...), built on lower-level components (e.g., programming language implementations, operating systems, networks, ...) right down to the hardware. Each of these sub-systems and components has a specification and the questions of validation and verification need answering.

Formal Specification and Verification

A specification is in some sense a contract between the implementor of a system and its stakeholders. (In some cases this is literally true.) It is therefore of interest to make sure that the specification is consistent and unambiguous, and to provide a method for verifying

In some cases it is possible to write down parts of a system's specification as statements in formal logic. We explore this over the next few topics by using Hoare Logic.

To use formal logic, however, we need a formal model of the kinds of programs that we want to specify and verify. Let us look at a very simple formal model of programs that will be enough to illustrate the important points of formal verification.

A Simple Model of Programs

We define our (simplified) world of programs and their execution via one predicate exec:

exec(prog, initialState, finalState)

which means “running the program prog with the initial state initialState can finish with the final state finalState”.

There may be no final state for a given initial state, meaning that for a fixed prog and initialState, there may be no finalState for which exec(prog, initialState, finalState) is true. In terms of execution of real programs, we would observe this by a program “hanging” and never returning an output.
This definition also allows multiple possible answers for the same initialState, where we could have exec(program, initialState, finalState1) and exec(program, initialState, finalState2) both being true with finalState1 != finalState2. This can also be used to talk about programs where some part is left unspecified, such as an exact ordering of data in a container (see, for example, how the Go Programming Language enforces that programs should not rely on the order of data stored in hashmaps).
This definition does not distinguish between things that are program-like and things that are data-like for input and output. In particular, a program can take itself as an input. This flexibility of self reference will be crucial for stating the halting problem and proving that it is undecidable.
This definition is highly simplified in many ways. It says nothing about the time, space, or other resources needed to carry out the computation of finalState from initialState. Nor does it allow for interactive computation where a program takes input and sends output during execution rather than at the start and end. Nevertheless, it does allow us to talk about what computers can and cannot compute.

Properties of Programs

Equipped with the exec predicate, we can use it to state various properties of a program prog.

We can say “the program prog halts for the input initialState”:
∃finalState. exec(prog, initialState, finalState)
The word “halts” comes from thinking of a computer as a machine that steps through the computation. A basic question is whether or not the machine runs forever, or halts with an answer. With our exec predicate, we are ignoring the details of individual steps, but we can still ask the question of whether or not a program produces an output.
As we will see when we look at the halting problem, it is not possible to write a program that reliably determines whether or not a program halts on a particular input.
“The program prog halts for all inputs”:
∀initialState. ∃finalState. exec(prog, initialState, finalState)
This is a stronger statement than the previous one. Instead of asking whether or not a program halts for a specific initialState, it asks whether or not it halts for all inputs.
The program prog does not halt on the input initialState:
¬(∃finalState. exec(prog, initialState, finalState))
The negation of the first property states that a prog produces no answer on the input initialState.
The exec predicate allows for multiple potential outputs for a single initial state, which is referred to as being “non-deterministic”. If we want to specify that a program is “deterministic”, then we need to say that for any two final states from the same initial state, those final states are equal:
∀s. ∀s1. ∀s2. exec(prog, s, s1) → exec(prog, s, s2) → s1 = s2

Specifying Programs

The statements about programs we looked at above are quite generic, and don't say much about what programs actually do. Usually we are interested in statements like “if the inital state satisfies P, then the final state satisfies Q”.

For example:

If the input is an array of numbers, the output is an array of the same numbers, but in sorted order.
If the input is a Java program, the output is Java bytecode that correctly implements the same behaviour as the original program.
If the input is a map and a start and end point, the output is the route from the start to the end point that is “the best”.
If the input is a description of the obstacles currently visible on the road, the output is instructions to the car's steering, brakes and acceleration that avoids them in the safest way possible.

It is possible to think of “specifications” that are quite difficult to write down. Nevertheless, it is possible for some small critical parts of programs to give precise specifications, such as “this program sorts arrays”, or this “program never throws a NullPointerException”.

Partial Correctness

We can say that the program prog satisfies a specification if whenever P is true for the input, then Q is true for any output of the program:

∀input. ∀output. P(input) → exec(prog, input, output) → Q(output)

The predicate P is called the precondition, and the predicate Q is called the postcondition.

This kind of specification is called partial correctness: it says that if the precondition holds and the program halts, then the postcondition holds for the output. This kind of specification is often written in the form:

	{ P } prog { Q }

called a Hoare Triple after C. A. R. Hoare, who invented the Hoare Logic named after him. It was introduced in Hoare's paper An axiomatic basis for computer programming.

We will look at a proof system for Hoare Triples in the page on Hoare Logic.

Total Correctness

A stronger condition than partial correctness is total correctness, which says that if the precondition P holds, then the program always halts, and every output the program can generate satisfies the postcondition Q:

∀s1. P(s1) → ((∃s2. exec(prog, s1, s2)) ∧ (∀s2. exec(prog, s1, s2) → Q(s2)))

Total Hoare triples are written like this:

	 [ P ] prog [ Q ]

As you might expect, total correctness implies partial correctness:

(config (assumptions (prog var) (prog-total-spec "all s1. P(s1) -> ((ex s2. exec(prog, s1, s2)) /\ (all s2. exec(prog, s1, s2) -> Q(s2)))")) (goal "all s1. all s2. P(s1) -> exec(prog,s1,s2) -> Q(s2)"))

The full specification of total correctness is quite a mouthful, and it seems a bit roundabout that we have to prove that the program halts and separately that all of the answers meet the postcondition. There is a weaker specification, that says that there exists at least one final state that meets the post condition, but leaves open the possibility that it might halt in another state that does not meet it.

∀s1. P(s1) → (∃s2. exec(prog, s1, s2) ∧ Q(s2))

The following theorem states that the stronger total correctness implies the weaker one:

(config (assumptions (prog var) (prog-total-spec "all s1. P(s1) -> ((ex s2. exec(prog, s1, s2)) /\ (all s2. exec(prog, s1, s2) -> Q(s2)))")) (goal "all s1. P(s1) -> (ex s2. exec(prog, s1, s2) /\ Q(s2))"))

This theorem states that the weaker one implies the stronger one, if we also assume that the program is deterministic (i.e., has at most one answer):

(config (assumptions (prog var) (prog-total-spec "all s1. P(s1) -> (ex s2. exec(prog, s1, s2) /\ Q(s2))") (prog-deterministic "all s. all s1. all s2. exec(prog,s,s1) -> exec(prog,s,s2) -> s1 = s2")) (goal "all s1. P(s1) -> ((ex s2. exec(prog, s1, s2)) /\ (all s2. exec(prog, s1, s2) -> Q(s2)))"))