COMP 317: Semantics of Programming Languages

Operational Semantics


In these notes we look at the operational semantics of the language IMP. Our goal is to develop a semantics that describes how programs of the language are executed. In IMP, as in any other imperative language, assignment is the most basic form of program: the other linguistic constructs (conditionals and loops) are simply ways of organizing assignments into series that are executed in a particular order. For example, think of how (according to our intuitions) the following program is evaluated:


      x := 0 ;
      f := 1 ;
      while x <= 2
      do
          x := x + 1 ;
          f := f * x

On each pass through the loop, the value of x is increased by 1, and the body of the loop is gone through three times (while x has values 0, 1, then 2). Thus, we could "unfold" the program above to the following series of assignments:

      x := 0 ;
      f := 1 ;
      x := x + 1 ;
      f := f * x ;
      x := x + 1 ;
      f := f * x ;
      x := x + 1 ;
      f := f * x

(one of the benefits of a formal semantics is that we could give a rigorous argument that these two programs are indeed equivalent).

Assignments, therefore, lie at the heart of imperative programming languages. They also lie at the heart of the semantics of imperative languages. In particular, they suggest that any semantics should be based on the notion of state (or storage), by which we mean the particular values that are associated with a program's variables. When a program is being executed, we can think of the computer that is running the program as being in a certain state. This state is determined by the values stored by the program's variables, and these values are updated by assignments to the variables. For example, after running the program above, the computer will be in a state where the variable x has the value 3, and the variable f has the value 6. We can think of a state as being a table that tells us the value associated with any given variable (since variables in IMP aren't declared, and have no scope, the state should tell us the value associated with any variable. This would give us an infinitely long table, and it is more convenient to think of a state as a function that takes a variable as argument and returns the value stored in that variable. More formally, a state is a function

      Loc  ->  N

where Loc is the set of variable names, and N is the set of numbers (in IMP, all variables store numbers). For example, after running a program

      x := 8 ;
      y := x + 1 ;
      z := y + 2 

we obtain a state s, with (We don't know what values s gives to variables other than x, y or z, but presumably those values aren't changed by the program.)

The operational semantics we will give for IMP will describe how assignments (and other programs) update states. Before we do this, note that an assignment like


      y := x + 1

depends on the state of the computer before the assignment is executed: the value assigned to y depends on the value of x in this "initial" state (e.g., if the value of x in the initial state is 8, then y is assigned the value 9, and if x has the value 23, then y is assigned the value 24). We begin by describing how expressions such as x+1 are evaluated in a given state.



The Evaluation of Arithmetic Expressions

The value of an IMP expression such as


      (x + 1) * (2*y + 1)

clearly depends upon the values of x and y. Since we want the semantics of (the symbol) "+" to be addition and the semantics of "*" to be multiplication, we should be able to get the value of such an expression by "plugging in" the values of x and y. The values of these variables depends upon the state of the computer when the expression is evaluated: for example, if the state s gives the value of x as 5 and the value of y as 3 (i.e., s(x) = 5 and s(y) = 3), then the value of the expression above is

      (s(x) + 1) * (2 * s(y) + 1)
    =
      (5 + 1) * (2*3 + 1)
    =
      6 * 7
    =
      42

We capture these intuitions by defining a relation between expressions, states and numbers that says what the value of an expression is in a given state. We use the notation (e,s) --> n to indicate that the IMP expression e has the value n when evaluated in the state s.

We define this relation by considering each possible form of the expression e (i.e., each of the options in the BNF definition of arithmetic expressions in IMP).

  1. The first case is where the expression is a number n; clearly, the value of this expression is just n, and doesn't depend on the state:
    
          (n,s) --> n
    
    
    for all numbers n and states s.
     
  2. The second case is where the expression is a variable x; in this case, the value does depend on the state, and in fact, the value should be the value of the variable in that state:
    
          (x,s) --> s(x)
    
    
    for all locations x and states s.
     
  3. The third case is where the expression is built from two expressions e1 and e2 using the symbol +, i.e., the expression is of the form e1 + e2. In this case, the result should be the sum of the values of e1 and e2, i.e., the result should be n1 + n2, where n1 is the value of e1 and n2 the value of e2 in the state s. We write this as the following implication:
    
          If (e1,s) --> n1 and (e2,s) --> n2, then (e1+e2,s) --> n1+n2.
    
    
    for all expressions e1 and e2, numbers n1 and n2, and states s.
     
  4. The fourth case is where the expression is built from two expressions e1 and e2 using the symbol *, i.e., the expression is of the form e1 * e2. In this case, the result should be the product of the values of e1 and e2, i.e., the result should be n1 * n2, where n1 is the value of e1 and n2 the value of e2 in the state s. We write this as the following implication:
    
          If (e1,s) --> n1 and (e2,s) --> n2, then (e1*e2,s) --> n1*n2.
    
    
    for all expressions e1 and e2, numbers n1 and n2, and states s.
     
  5. The final case is where the expression is built from two expressions e1 and e2 using the symbol -, i.e., the expression is of the form e1 - e2. This is treated in the same way as for + and *, and is left as an exercise.

For example, consider the expression


      (x + 1) * (2*y + 1).

Let s be a state with s(x) = 5 and s(y) = 3. We show that the value of the above expression in the state s is 42.

The clauses (1)-(5) above define the relation (e,s) --> n. Clauses (3)-(5) define the semantics of compound expressions (expressions built from other expressions) using the semantics (i.e., the values) of their subexpressions. This is evident from the example of (x+1)*(2*y+1) above. When a semantics defines the meaning of expressions (whether these expressions be arithmetic expressions, Boolean expressions or programs) in terms of the meanings of their subexpressions, we say the semantics is compositional.

Note that each of the five clauses (1)-(5) in the definition of the evaluation relation corresponds to one of the "options" (or "cases") in the BNF definition of <Aexp>. When a definition gives a clause for each case, in a compositional way, we say the definition is inductive. Thus, for example, clauses (1)-(5) above give an inductive definition of the evaluation relation (e,s) --> n.



The Evaluation of Boolean Expressions

Boolean expressions are given an operational semantics in much the same way as arithmetic expressions. We define a relation (b,s) --> v that describes how Boolean expressions are evaluated; here, b is a Boolean expression in <Bexp>, s is a state, and v is a truth-value (i.e., either true or false). The definition of this evaluation relation is inductive:

  1. The first case is where the expression is true clearly, the value of this expression should just be true, and doesn't depend on the state:
    
          (true,s) --> true
    
    
    for all states s.
     
  2. The second case is where the expression is false clearly, the value of this expression should just be false, and doesn't depend on the state:
    
          (false,s) --> false
    
    
    for all states s.
     
  3. The third case is where the expression is built from two arithmetic expressions e1 and e2 using the symbol "="; i.e., the expression is of the form e1 = e2. The idea is that this Boolean expression represents an equality test, and should be true if, and only if, the expressions e1 and e2 are "equal" (the quotes here are because we're not testing whether the expressions are equal - they're equal only when they're identical as strings of characters - rather, we're testing whether their semantics - their values - are equal). Therefore, the result should be true if e1 and e2 have the same value in a given state, and false otherwise:
    
          If (e1,s) --> n1 and (e2,s) --> n2, then (e1=e2,s) --> v,
              where v is true if n1 = n2, and false otherwise.
    
    
    for all expressions e1 and e2, numbers n1 and n2, and states s.
     
  4. The case where the expression is built from two arithmetic expressions e1 and e2 using the symbol "<=" is treated similarly.
     
  5. We now consider the third case where the expression is built from two Boolean expressions e1 and e2 using the symbol "and" (logical conjunction); i.e., the expression is of the form b1 and b2. The idea is that this Boolean expression should be true just when both b1 and b2 are true and false otherwise:
    
          If (b1,s) --> v1 and (b2,s) --> v2,
          then (b1 and b2,s) --> v,
              where v is true if v1 and v2 are both true, and false otherwise.
    
    
    for all Boolean expressions b1 and b2, truth-values v1, v2 and v, and states s.

The remaining cases, corresponding to the cases in the BNF definition of <Bexp>, are left as an exercise.



The Evaluation of Programs

Again, we define the semantics of programs by inductively defining a relation (c,s) --> s', where c is a program (a term in <Com>) and s and s' are states. What we mean when we write (c,s) --> s' is this: if we start in a state s and run the program c then we end up in the state s'.

  1. The first case is where c is the program skip. The idea is that skip is a program that does nothing, and therefore doesn't change the state at all:
    
          (skip,s) --> s
    
    
    for all states s.
     
  2. Assignments do change the state, by updating the value stored in a variable. If we evaluate an assignment x := e in a state s, then we obtain a new state that differs from s only in the value of x, which is now set to be the value of e. Let's suppose that the value of e in s is the number n; we write s[n/x] for this new state (this can be pronounced "s where n is now stored in x. This new state is, of course, a state, and therefore a function from variable names to numbers; it is formally described by the following two equations:
    
          s[n/x](x) = n
    
          s[n/x](y) = s(y),  for all y different from x.
    
    
    The first of these equations says that the value now stored in x is n (the value of e); the second equation says that only the value of x has been changed: all other variables have their original values. It is this updated state, then, that is the result of evaluating the assignment in state s:
    
          If (e,s) --> n, then (x := e,s) --> s[n/x]
    
    
    for all states s, variables x and expressions e.
     
  3. For sequential compositions, i.e., programs of the form c1 ; c2, we want to say that c1 is evaluated first, and then c2. If we start in a state s, evaluating c1 will lead to a new state, say s', and it is in this state that we start to evaluate c2:
    
          If (c1,s) --> s' and (c2,s') --> s'',
          then (c1;c2,s) --> s''
    
    
    for all states s, s' and s'', and programs c1 and c2.
     
  4. Consider a program of the form
    
          if b then c1 else c2 .
    
    
    The evaluation of this program proceeds by first evaluating the Boolean expression b; if this evaluates to true, then c1 is evaluated:
    
          If (b,s) --> true and (c1,s) -- s',
          then (if b then c1 else c2,s) --> s'.
    
    
    Similarly, if b evaluates to false, then c2 is evaluated:
    
          If (b,s) --> false and (c2,s) -- s',
          then (if b then c1 else c2,s) --> s'.
    
    

     
  5. Finally, we define the semantics of loops, i.e., programs of the form
    
          while b do c .
    
    
    Again, evaluation proceeds by first evaluating b. If the result is false, then the body of the loop isn't evaluated; the program terminates immediately without changing the state:
    
          If (b,s) --> false
          then (while b do c,s) --> s.
    
    
    On the other hand, if b evaluates to true, then the body c is evaluated, and then the process is repeated: b is evaluated again, and the loop is either exited or iterated depending on whether the result is true or false. Effectively, if b evaluates to true, then the program
    
          c ; while b do c
    
    
    is evaluated. Therefore:
    
          If (b,s) --> true and (c,s) --> s' and (while b do c,s') --> s''
          then (while b do c,s) --> s''.
    
    
    This says: when the condition b is true, then evaluate the body c, giving a new state s', then repeat the loop again: if this gives a final state s'', then that state is the result of evaluating the loop.

This completes the definition of the evaluation relation for programs, and gives an operational semantics for IMP.



Some Issues Concerning the Operational Semantics

What do we get from the semantics?

The semantics describes how arithmetic and Boolean expressions are evaluated, and how programs update states. The semantics is given by the inductively defined relations (a,s) --> n for arithmetic expressions, (b,s) --> v for Boolean expressions, and (c,s) --> s' for programs. Because these relations are formally defined, we can use them in giving rigorous arguments about programs (Winskel gives some examples). We won't prove anything in these notes, as the proofs use some rather involved inductive arguments that go beyond the scope of this module (we'll see some rigorous arguments once we've described the denotational and axiomatic semantics of IMP). However, the possibility of constructing formal correctness proofs for programs is certainly one of the main uses of the semantics. Moreover, because the semantics is compositional, it is easy to modify or extend the semantics; some examples follow:
 

What about "lazy" and?

The semantics of logical conjunction ("and") requires both arguments to be evaluated: clause (5) in the definition of the evaluation relation for Boolean expressions gives the semantics of "b1 and b2" by looking at the values of both b1 and b2. Many languages have a "lazy" logical conjunction (e.g., the operator "&&" of C or Java) which will first evaluate b1, and if this evaluates to false, then b2 is not evaluated. This kind of lazy semantics could be captured by the following two rules:


      If (b1,s) --> false, then (b1 and b2, s) --> false

and

      If (b1,s) --> true and (b2,s) --> v, then (b1 and b2, s) --> v

Thus the expression b2 is only evaluated if b1 evaluates to true.
 

How do we get a more operational operational semantics?

The semantics as it stands doesn't say much about how programs are evaluated, in the sense that we simply have a relation (c,s) --> s' that tells us what the final state is after evaluating the program c - it doesn't say anything about the intermediate states that we pass through in evaluating the program. For example, consider the program:


      x := 0 ;
      y := 1 ;
      z := x + y

When we evaluate this program in a state s, we begin by evaluating the assignment x := 0, giving a new state s[0/x], and the "remainder"

      y := 1 ;
      z := x + y

still to be evaluated; the next assignment, y := 1, is then evaluated, giving a state s[0/x][1/y], and the remaining program

      z := x + y

still to be evaluated; and so on until the program is fully evaluated. We can give an alternative semantics that describes the individual steps in a computation by defining a "one-step" evaluation relation (c,s) -1-> (c',s') which says that if we evaluate a program c in a state s, then after one step of the evaluation, we obtain a new state s', and the program c' remains to be evaluated. For example, we would like to have:

      (x := 0 ; y := 1 ; z := x+y, s) -1-> (y := 1 ; z := x+y, s[0/x]).

Of course, some programs, such as single assignments, can be fully evaluated in just one step, in which case, we will write (c,s) -1-> s', which means "the program c is fully evaluated in one step, starting in state s and giving a new state s' as a result". We would then be able to chain these relations together to obtain our original operational semantics for IMP; for example:

      (x := 0 ; y := 1 ; z := x+y, s)
   -1->
      (y := 1 ; z := x+y, s[0/x])
   -1->
      (z := x+y, s[0/x][1/y])
   -1->
      s[0/x][1/y][1/z]

Clearly, we would also like to have

      (c0,s0) -1-> (c1,s1) -1-> ... -1-> (cn,sn) -1-> s

if and only if (c0,s0) --> s (this will, in fact, be the case, although we won't prove it: a proof is given in Winskel's book).

Before giving the one-step evaluation relation for programs, we give one-step evaluation relations for arithmetic expresions and for Boolean expressions. The idea is that arithmetic expressions are evaluated in a number of steps, beginning with the leftmost innermost subexpression. For example, if the state s gives the value of x as 5 and the value of y as 3 (i.e., s(x) = 5 and s(y) = 3), then the expression (x + 1) * (2*y + 1) would be evaluated as follows:


      ((x + 1) * (2*y + 1), s)
   -1->
      ((5 + 1) * (2*y + 1), s)
   -1->
      (6 * (2*y + 1), s)
   -1->
      (6 * (2*3 + 1), s)
   -1->
      (6 * (6 + 1), s)
   -1->
      (6 * 7, s)
   -1->
      42

An inductive definition of this relation is:
  1. For numbers n,
    
          (n,s) -1-> n .
    
    

     
  2. For variables x,
    
          (x,s) -1-> s(x) .
    
    

     
  3. For expressions a1 and a2:
    
          If  (a1,s) -1-> (a1',s)  then  (a1+a2,s) -1-> (a1'+a2,s) .
    
          If  (a1,s) -1-> m  then  (a1+a2,s) -1-> (m+a2,s) .
    
          If  (a2,s) -1-> (a2',s)  then  (m+a2,s) -1-> (m+a2',s)
    
          If  (a2,s) -1-> n  then  (a1+a2,s) -1-> m+n .
    
    
    Here, m and n are numbers, i.e., fully evaluated expressions. These four rules say that the left subexpression (a1) is evaluated first, and the second (a2) is evaluated only once the first is fully evaluated (to a number m); finally, when both a1 and a2 are fully evaluated (to the numbers m and n), the result of evaluating the expression is the number m+n.
The cases of expressions a1 - a2 and a1 * a2 are similar to the above.

A one-step evaluation relation for Boolean expressions can also be defined similarly.

Now the one-step evaluation relation for programs is defined inductively as follows:

  1. The first case says that skip is evaluated in one step:
    
          (skip,s) -1-> s .
    
    
  2. Assignments are evaluated by first evaluating the expression on the left of the := operator:
    
          If  (a,s) -1-> (a',s)  then  (x := a, s) -1-> (x := a', s) .
    
    
    Assignments are then evaluated in one step once the expression is fully evaluated to a number n:
    
          If  (a,s) -1-> n   then  (x := a, s) -1-> s[n/x] .
    
    
  3. For sequential composition:
    
        If  (c1,s) -1-> (c1',s')  then  (c1;c2, s) -1-> (c1';c2, s') .
    
        If  (c1,s) -1-> s'  then  (c1;c2, s) -1-> (c2, s') .
    
    
  4. Conditionals begin by evaluating the test:
    
        If  (b,s) -1-> (b',s)  then  (if b then c1 else c2, s) -1->, (if b' then c1 else c2, s) .
    
        If  (b,s) -1-> true  then  (if b then c1 else c2, s) -1-> (c1,s) .
    
        If  (b,s) -1-> false  then  (if b then c1 else c2, s) -1-> (c2,s) .
    
    
  5. The case of while loops is left as an exercise.

 

What about side-effects?

In the first operational semantics and in the one-step operational semantics, the evaluation of expressions does not change the state. In this sense, IMP has no side-effects. Many languages, however, do allow side-effects; for example, in C and Java, the expression x++ evaluates to the value of the variable x, but has the side-effect of incrementing the value stored in this variable. Thus, in a state s with s(x) = 3, the expression


      (2 * x++) + 1

would evaluate to 7, but the state would be updated to s[4/x]. It is straightforward to extend the syntax of IMP to allow expressions with side-effects; we extend the BNF syntax with

    <Aexp> ::=  ... | <Loc>++

and we give a semantics for this operator by adding the following rule to the inductive definition of the one-step evaluation relation for arithmetic expressions:

    (x++, s) -1-> (s(x), s[(s(x)+1)/x]) .



Exercises

  1. Complete the definition of the evaluation relation for arithmetic expressions by writing out clause (5) as fully as possible.
     
  2. The example of the evaluation of the arithmetic expression (x+1)*(2*y+1) is really a proof that ((x+1)*(2*y+1),s) --> 42 when s(x)=5 and s(y)=3. In the lectures, we saw how to write such proofs in "tree form". Rewrite the example proof that ((x+1)*(2*y+1), s) --> 42) in tree form.
     
  3. Complete the definition of the evaluation relation for Boolean expressions by writing out the required clauses (this differs from Exercise 1 in that you should also know what clauses are required for an inductive definition). (And yes, the answer to this exercise has been given in the lectures; one point of this exercise is to test how good your lecture notes are.)
     
  4. Use the tree-form style to discover the state that results from running the program
    
          x := 1 ; while not(x <= 0) do x := x-1 .
    
    

     
  5. Write out an inductive definition for the one-step evaluation relation for Boolean expressions. Does your definition give "lazy" logical operators? If not, give an alternative semantics that does give lazy operators.
     
  6. Complete the definition of the one-step evaluation relation for programs by giving the rule for while-loops.
     
  7. Extend the syntax and operational semantics of IMP with an if-then construct, so that a program of the form
    
        if b then c
    
    
    is equivalent to
    
        if b then c else skip
    
    
    Show that these two programs are actually equivalent.
     
  8. Extend the syntax and (one-step) semantics of IMP with a pre-increment operator ++<Loc> (the value of the expression ++x is the value of x plus one, and has the side-effect of incrementing x.
     
  9. Some languages, such as Charity, are non-deterministic, in that the programmer can specify a non-deterministic choice of programs. Such programs are evaluated by choosing (randomly) one of the given programs. We write c1<>c2 to denote a non-deterministic choice between programs c1 and c2; this program is evaluated by either evaluating c1 or by evaluating c2, with no way of knowing beforehand which choice will be made. For example, the program
    
        x := 0 <> x := 1
    
    
    Will assign either 0 or 1 to x. Extend the syntax of IMP with a non-deterministic choice <Com> <> <Com>. Give an operational semantics for this by extending the inductive definition of the evaluation relation (c,s) --> s'. Also, give an extension of the one-step semantics.
     
  10. Some languages allow "assertions" that, if false, will cause a run-time error. If b is a Boolean expression, then the assertion {{b}} has no effect if b is true, but causes the program to fail if b is false. An operational semantics for this can be given by extending the inductive definition of the evaluation relation with:
    
        ({{b}}, s) --> s'   if and only if   (b,s) --> true  and  s = s' .
    
    
    Describe in words the behaviour of a program
    
        ({{b1}} ; c1) <> ({{b2}} ; c2) .
    



Review Questions

  1. What do we mean when we say a definition is "inductive"?
     
  2. Review the rules for "lazy" and. How do these rules actually determine an order of evaluation? Suppose we had the rule
    
        If either  (b1,s) --> false   or  (b2,s) --> false 
        then  (b1 and b2, s) --> false .
    
    
    What, if anything, does this say about the order of evaluation?



Grant Malcolm

Last modified: Tue Oct 8 00:02:15 BST 2002