In these notes we look at the operational semantics of the language IMP. Our goal is to develop a semantics that describes how programs of the language are executed. In IMP, as in any other imperative language, assignment is the most basic form of program: the other linguistic constructs (conditionals and loops) are simply ways of organizing assignments into series that are executed in a particular order. For example, think of how (according to our intuitions) the following program is evaluated:
x := 0 ;
f := 1 ;
while x <= 2
do
x := x + 1 ;
f := f * x
On each pass through the loop, the value of x is
increased by 1, and the body of the loop is gone through three times
(while x has values 0, 1,
then 2).
Thus, we could "unfold" the program above to the following series
of assignments:
x := 0 ;
f := 1 ;
x := x + 1 ;
f := f * x ;
x := x + 1 ;
f := f * x ;
x := x + 1 ;
f := f * x
(one of the benefits of a formal semantics is that we could give
a rigorous argument that these two programs are indeed equivalent).
Assignments, therefore, lie at the heart of imperative programming
languages.
They also lie at the heart of the semantics of imperative languages.
In particular, they suggest that any semantics should be based on
the notion of state (or storage), by which we
mean the particular values that are associated with a program's
variables.
When a program is being executed, we can think of the computer
that is running the program as being in a certain state.
This state is
determined by the values stored by the program's variables, and
these values are updated by assignments to the variables.
For example, after running the program above, the computer will
be in a state where the variable x has the value
3, and the variable f has the value
6.
We can think of a state as being a table that tells us the value
associated with any given variable (since variables in IMP aren't
declared, and have no scope, the state should tell us the value
associated with any variable.
This would give us an infinitely long table, and it is more
convenient to think of a state as a function that takes a variable
as argument and returns the value stored in that variable.
More formally, a state is a function
Loc -> N
where Loc is the set of variable names,
and N is the set of numbers (in IMP, all variables
store numbers).
For example, after running a program
x := 8 ;
y := x + 1 ;
z := y + 2
we obtain a state s, with
x) = 8, y) = 9, and z) = 11. x, y or z,
but presumably those values aren't changed by the program.)
The operational semantics we will give for IMP will describe how assignments (and other programs) update states. Before we do this, note that an assignment like
y := x + 1
depends on the state of the computer before the assignment
is executed: the value assigned to y depends
on the value of x in this "initial" state
(e.g., if the value of x in the initial state
is 8, then y is assigned the value 9,
and if x has the value 23, then
y is assigned the value 24).
We begin by describing how expressions such as x+1
are evaluated in a given state.
The value of an IMP expression such as
(x + 1) * (2*y + 1)
clearly depends upon the values of x and y.
Since we want the semantics of (the symbol) "+" to be
addition and the semantics of "*" to be multiplication,
we should be able to get the value of such an expression by "plugging in"
the values of x and y.
The values of these variables depends upon the state of the computer
when the expression is evaluated:
for example, if the state s gives the value of x
as 5 and the value of y as 3
(i.e., s(x) = 5 and
s(y) = 3), then the value of the
expression above is
(s(x) + 1) * (2 * s(y) + 1)
=
(5 + 1) * (2*3 + 1)
=
6 * 7
=
42
We capture these intuitions by defining a relation between expressions,
states and numbers that says what the value of an expression is in a given
state.
We use the notation (e,s) --> n to indicate that the IMP expression
e has the value n when evaluated in the state s.
We define this relation by considering each possible form of the expression e (i.e., each of the options in the BNF definition of arithmetic expressions in IMP).
(n,s) --> n
for all numbers n and states s. x;
in this case, the value does depend on the state, and in fact,
the value should be the value of the variable in that state:
(x,s) --> s(x)
for all locations x and states s. +,
i.e., the expression is of the form e1 + e2.
In this case, the result should be the sum of the values of
e1 and e2, i.e., the result should be n1 + n2,
where n1 is the value of e1 and n2 the
value of e2 in the state s. We write this as the
following implication:
If (e1,s) --> n1 and (e2,s) --> n2, then (e1+e2,s) --> n1+n2.
for all expressions e1 and e2,
numbers n1 and n2, and states s. *,
i.e., the expression is of the form e1 * e2.
In this case, the result should be the product of the values of
e1 and e2, i.e., the result should be n1 * n2,
where n1 is the value of e1 and n2 the
value of e2 in the state s. We write this as the
following implication:
If (e1,s) --> n1 and (e2,s) --> n2, then (e1*e2,s) --> n1*n2.
for all expressions e1 and e2,
numbers n1 and n2, and states s. -,
i.e., the expression is of the form e1 - e2.
This is treated in the same way as for + and
*, and is left as an exercise.
For example, consider the expression
(x + 1) * (2*y + 1).
Let s be a state with s(x) = 5
and s(y) = 3.
We show that the value of the above expression in the state s
is 42.
x,s) --> 5. 1,s) --> 1. x+1,s) --> 6. 2,s) --> 2. y,s) --> 3. 2*y,s) --> 6. 2*y + 1,s) --> 7. (x+1)*(2*y + 1),s) --> 42.
The clauses (1)-(5) above define the relation (e,s) --> n.
Clauses (3)-(5) define the semantics of compound expressions
(expressions built from other expressions) using the semantics
(i.e., the values) of their subexpressions.
This is evident from the example of (x+1)*(2*y+1) above.
When a semantics defines the meaning of expressions (whether these
expressions be arithmetic expressions, Boolean expressions or
programs) in terms of the meanings of their subexpressions, we
say the semantics is compositional.
Note that each of the five clauses (1)-(5) in the definition of the
evaluation relation corresponds to one of the "options" (or "cases")
in the BNF definition of <Aexp>.
When a definition gives a clause for each case, in a compositional
way, we say the definition is inductive.
Thus, for example, clauses (1)-(5) above give an inductive definition
of the evaluation relation (e,s) --> n.
Boolean expressions are given an operational semantics in much the
same way as arithmetic expressions.
We define a relation (b,s) --> v that describes how Boolean
expressions are evaluated; here, b is a Boolean expression
in <Bexp>, s is a state, and v
is a truth-value (i.e., either true or false).
The definition of this evaluation relation is inductive:
true
clearly, the value of this expression should just be true, and
doesn't depend on the state:
(true,s) --> true
for all states s. false
clearly, the value of this expression should just be false, and
doesn't depend on the state:
(false,s) --> false
for all states s. =";
i.e., the expression is of the form e1 = e2.
The idea is that this Boolean expression represents an equality test,
and should be true if, and only if, the expressions e1 and
e2 are "equal" (the quotes here are because we're not testing
whether the expressions are equal - they're equal only when they're
identical as strings of characters - rather, we're testing whether
their semantics - their values - are equal).
Therefore, the result should be true if
e1 and e2 have the same value in a given state,
and false otherwise:
If (e1,s) --> n1 and (e2,s) --> n2, then (e1=e2,s) --> v,
where v is true if n1 = n2, and false otherwise.
for all expressions e1 and e2,
numbers n1 and n2, and states s. <="
is treated similarly. and"
(logical conjunction);
i.e., the expression is of the form b1 and b2.
The idea is that this Boolean expression should be true just when
both b1 and b2 are true and false otherwise:
If (b1,s) --> v1 and (b2,s) --> v2,
then (b1 and b2,s) --> v,
where v is true if v1 and v2 are both true, and false otherwise.
for all Boolean expressions b1 and b2,
truth-values v1, v2 and v, and states s.
The remaining cases, corresponding to the cases in the BNF definition
of <Bexp>, are left as an exercise.
Again, we define the semantics of programs by inductively
defining a relation (c,s) --> s', where c is
a program (a term in <Com>)
and s and s' are states.
What we mean when we write (c,s) --> s' is this:
if we start in a state s and run the program c
then we end up in the state s'.
skip.
The idea is that skip is a program that does nothing,
and therefore doesn't change the state at all:
(skip,s) --> s
for all states s. x := e in a
state s, then we obtain a new state that differs from s
only in the value of x, which is now set to be the
value of e. Let's suppose that the value of e
in s is the number n; we write
s[n/x] for this new state
(this can be pronounced "s where n is now stored in
x. This new state is, of course, a state, and therefore
a function from variable names to numbers; it is formally described by
the following two equations:
s[n/x](x) = n
s[n/x](y) = s(y), for all y different from x.
The first of these equations says that the value now stored in
x is n (the value of e);
the second equation says that only the value of
x has been changed: all other variables have their
original values. It is this updated state, then, that is the
result of evaluating the assignment in state s:
If (e,s) --> n, then (x := e,s) --> s[n/x]
for all states s, variables x and expressions
e. c1 ; c2, we want to say that c1 is
evaluated first, and then c2. If we start in a state
s, evaluating c1 will lead to a new state,
say s', and it is in this state that we start to evaluate
c2:
If (c1,s) --> s' and (c2,s') --> s'',
then (c1;c2,s) --> s''
for all states s, s' and s'', and
programs c1 and c2.
if b then c1 else c2 .
The evaluation of this program proceeds by first evaluating the
Boolean expression b; if this evaluates to true,
then c1 is evaluated:
If (b,s) --> true and (c1,s) -- s',
then (if b then c1 else c2,s) --> s'.
Similarly, if b evaluates to false,
then c2 is evaluated:
If (b,s) --> false and (c2,s) -- s',
then (if b then c1 else c2,s) --> s'.
while b do c .
Again, evaluation proceeds by first evaluating b.
If the result is false, then the body of the loop isn't
evaluated; the program terminates immediately without changing the
state:
If (b,s) --> false
then (while b do c,s) --> s.
On the other hand, if b evaluates to true,
then the body c is evaluated, and then the process
is repeated: b is evaluated again, and the loop
is either exited or iterated depending on whether the result is
true or false. Effectively, if b
evaluates to true, then the program
c ; while b do c
is evaluated. Therefore:
If (b,s) --> true and (c,s) --> s' and (while b do c,s') --> s''
then (while b do c,s) --> s''.
This says: when the condition b is true, then evaluate
the body c, giving a new state s', then
repeat the loop again: if this gives a final state s'',
then that state is the result of evaluating the loop. This completes the definition of the evaluation relation for programs, and gives an operational semantics for IMP.
The semantics describes how arithmetic and Boolean expressions
are evaluated, and how programs update states.
The semantics is given by the inductively defined relations
(a,s) --> n for arithmetic expressions,
(b,s) --> v for Boolean expressions, and
(c,s) --> s' for programs.
Because these relations are formally defined, we can use them in
giving rigorous arguments about programs (Winskel gives some
examples). We won't prove anything in these notes, as the proofs
use some rather involved inductive arguments that go beyond the
scope of this module (we'll see some rigorous arguments once we've
described the denotational and axiomatic semantics of IMP).
However, the possibility of constructing formal correctness proofs
for programs is certainly one of the main uses of the semantics.
Moreover, because the semantics is compositional, it is easy
to modify or extend the semantics; some examples follow:
The semantics of logical conjunction ("and")
requires both arguments to be evaluated: clause (5) in the
definition of the evaluation relation for Boolean expressions
gives the semantics of "b1 and b2"
by looking at the values of both b1 and b2.
Many languages have a "lazy" logical conjunction (e.g., the
operator "&&" of C or Java) which will first evaluate b1,
and if this evaluates to false, then b2 is not
evaluated. This kind of lazy semantics could be captured by the
following two rules:
If (b1,s) --> false, then (b1 and b2, s) --> false
and
If (b1,s) --> true and (b2,s) --> v, then (b1 and b2, s) --> v
Thus the expression b2 is only evaluated if
b1 evaluates to true. The semantics as it stands doesn't say much about how programs are evaluated, in the sense that we simply have a relation (c,s) --> s' that tells us what the final state is after evaluating the program c - it doesn't say anything about the intermediate states that we pass through in evaluating the program. For example, consider the program:
x := 0 ;
y := 1 ;
z := x + y
When we evaluate this program in a state s, we begin by
evaluating the assignment x := 0, giving a new
state s[0/x], and the "remainder"
y := 1 ;
z := x + y
still to be evaluated; the next assignment, y := 1,
is then evaluated, giving a state
s[0/x][1/y], and
the remaining program
z := x + y
still to be evaluated; and so on until the program is fully
evaluated. We can give an alternative semantics that describes
the individual steps in a computation by defining a
"one-step" evaluation relation (c,s) -1-> (c',s')
which says that if we evaluate a program c in a state
s, then after one step of the evaluation, we obtain a
new state s', and the program c' remains to be
evaluated. For example, we would like to have:
(x := 0 ; y := 1 ; z := x+y, s) -1-> (y := 1 ; z := x+y, s[0/x]).
Of course, some programs, such as single assignments, can be fully
evaluated in just one step, in which case, we will write
(c,s) -1-> s', which means "the program c is
fully evaluated in one step, starting in state s and
giving a new state s' as a result".
We would then be able to chain these relations together to obtain
our original operational semantics for IMP; for example:
(x := 0 ; y := 1 ; z := x+y, s)
-1->
(y := 1 ; z := x+y, s[0/x])
-1->
(z := x+y, s[0/x][1/y])
-1->
s[0/x][1/y][1/z]
Clearly, we would also like to have
(c0,s0) -1-> (c1,s1) -1-> ... -1-> (cn,sn) -1-> s
if and only if (c0,s0) --> s (this will, in fact, be the case,
although we won't prove it: a proof is given in Winskel's book).
Before giving the one-step evaluation relation for programs,
we give one-step evaluation relations for arithmetic expresions
and for Boolean expressions.
The idea is that arithmetic expressions are evaluated in a number
of steps, beginning with the leftmost innermost subexpression.
For example, if the state s gives the value of x
as 5 and the value of y as 3
(i.e., s(x) = 5 and
s(y) = 3), then the expression
(x + 1) * (2*y + 1) would be evaluated as follows:
((x + 1) * (2*y + 1), s)
-1->
((5 + 1) * (2*y + 1), s)
-1->
(6 * (2*y + 1), s)
-1->
(6 * (2*3 + 1), s)
-1->
(6 * (6 + 1), s)
-1->
(6 * 7, s)
-1->
42
An inductive definition of this relation is:
(n,s) -1-> n .
(x,s) -1-> s(x) .
If (a1,s) -1-> (a1',s) then (a1+a2,s) -1-> (a1'+a2,s) .
If (a1,s) -1-> m then (a1+a2,s) -1-> (m+a2,s) .
If (a2,s) -1-> (a2',s) then (m+a2,s) -1-> (m+a2',s)
If (a2,s) -1-> n then (a1+a2,s) -1-> m+n .
Here, m and n are numbers, i.e., fully evaluated expressions.
These four rules say that the left subexpression (a1) is
evaluated first, and the second (a2) is evaluated only once
the first is fully evaluated (to a number m);
finally, when both a1 and a2 are fully evaluated
(to the numbers m and n), the result of evaluating
the expression is the number m+n. - a2 and
a1 * a2 are similar to the above.
A one-step evaluation relation for Boolean expressions can also be defined similarly.
Now the one-step evaluation relation for programs is defined inductively as follows:
skip is evaluated
in one step:
(skip,s) -1-> s .
:= operator:
If (a,s) -1-> (a',s) then (x := a, s) -1-> (x := a', s) .
Assignments are then evaluated in one step once the expression is fully
evaluated to a number n:
If (a,s) -1-> n then (x := a, s) -1-> s[n/x] .
If (c1,s) -1-> (c1',s') then (c1;c2, s) -1-> (c1';c2, s') .
If (c1,s) -1-> s' then (c1;c2, s) -1-> (c2, s') .
If (b,s) -1-> (b',s) then (if b then c1 else c2, s) -1->, (if b' then c1 else c2, s) .
If (b,s) -1-> true then (if b then c1 else c2, s) -1-> (c1,s) .
If (b,s) -1-> false then (if b then c1 else c2, s) -1-> (c2,s) .
In the first operational semantics and in the one-step operational semantics,
the evaluation of expressions does not change the state.
In this sense, IMP has no side-effects.
Many languages, however, do allow side-effects;
for example, in C and Java, the expression x++ evaluates
to the value of the variable x, but has the side-effect
of incrementing the value stored in this variable.
Thus, in a state s with s(x) = 3,
the expression
(2 * x++) + 1
would evaluate to 7, but the state would be updated to
s[4/x].
It is straightforward to extend the syntax of IMP to allow expressions
with side-effects; we extend the BNF syntax with
<Aexp> ::= ... | <Loc>++
and we give a semantics for this operator by adding the following
rule to the inductive definition of the one-step evaluation relation
for arithmetic expressions:
(x++, s) -1-> (s(x), s[(s(x)+1)/x]) .
(x+1)*(2*y+1) is really a proof that
((x+1)*(2*y+1),s) --> 42 when
s(x)=5 and s(y)=3.
In the lectures, we saw how to write such proofs in "tree form".
Rewrite the example proof that
((x+1)*(2*y+1), s) --> 42) in tree form.
x := 1 ; while not(x <= 0) do x := x-1 .
if b then c
is equivalent to
if b then c else skip
Show that these two programs are actually equivalent. ++<Loc>
(the value of the expression ++x is the value of
x plus one, and has the side-effect of incrementing
x. <>c2 to denote a non-deterministic
choice between programs c1 and c2; this program is
evaluated by either evaluating c1 or by evaluating c2,
with no way of knowing beforehand which choice will be made.
For example, the program
x := 0 <> x := 1
Will assign either 0 or 1
to x.
Extend the syntax of IMP with a non-deterministic choice
<Com> <> <Com>.
Give an operational semantics for this by extending the inductive
definition of the evaluation relation (c,s) --> s'.
Also, give an extension of the one-step semantics. {{b}} has no
effect if b is true, but causes the program to fail
if b is false. An operational semantics for this can
be given by extending the inductive definition of the evaluation
relation with:
({{b}}, s) --> s' if and only if (b,s) --> true and s = s' .
Describe in words the behaviour of a program
({{b1}} ; c1) <> ({{b2}} ; c2) .
If either (b1,s) --> false or (b2,s) --> false
then (b1 and b2, s) --> false .
What, if anything, does this say about the order of evaluation?