This paper describes a conditional term rewriting system for Joy based on the two constructors concatenation and quotation.
A rewriting system consists of a set of syntactic rules for
performing replacements on certain suitable entities. The best known such system
is the one we learnt at school for evaluating arithmetic expressions. Any
programming language can be given a rewriting system, but for Joy it is
particularly simple. The basic binary rewriting relation will be written in
infix notation as =>
, pronounced "can be rewritten as". The
following are some sample rules for the + operator, the
< predicate and the dip combinator.
2 3 + => 5 2 3 < => true a [P] dip => P aIn the last example,
P
is any program and a
is
any literal (such as a number) or a program whose net effect is to push exactly
one item onto the stack. The rewriting relation is extended to allow rewriting
in appropriate contexts, further extended to accomodate several rewriting steps,
and finally extended to become a congruence relation, an equivalence relation
compatible with program concatenation. This congruence relation between programs
is essentially the same as the identity relation in the algebra of of functions
which the programs denote. Although Joy functions take a stack as argument and
value, in the rewrite rules the stack is never mentioned.
The following are rewriting rules for arithmetic expressions in four different notations: infix, functional, prefix and postfix:
2 + 3 => 5 +(2,3) => 5 + 2 3 => 5 2 3 + => 5In each case on the left the operands are
2
and
3
, and the operator or constructor is +
, so
they all refer to the same arithmetic term. Since Joy uses what looks like
postfix notation, it might be thought that one should attempt a term rewriting
system with rules just like the second one in the last line. That would treat
the short program 2 3 +
as being composed of two operands and one
operator or constructor. It would also treat the gap between 2
and
3
as quite different from the gap between 3
and
+
. The difference would be explained away as a syntactic
coincidence due to the choice of notation. Apart from +
there would
be very many term constructors.
However, Joy has operators for manipulating the top few elements of the stack, such as swap, dup and pop. These are also found in the language Forth, see for example \AX{Salman {\it et al}}{1984}{Salman-etal:84} and \AX{Kelly {\it et al}}{1986}{Kelly-etal:86}. These operators take a stack as argument and yield a stack as value, and their presence forces all other operators to be of the same type. For example, the following is a rewrite rule for swap:
a b swap => b aUnlike Forth, Joy also has quotations and combinators. These features also force the conclusion that the appropriate rewriting system is a string rewriting system. Consider the following four programs:
[2] [3 +] b [2] [3 +] concat i [2 3] [+] b [2 3] [+] concat iThey all eventually have to reduce to
5
, just like the
earlier Joy program 2 3 +
. It suggests that in the latter the gaps
have to be treated in the same way, the program is a concatenation of three
atomic symbols, and it denotes the composition of three functions. So, at least
for Joy programs without quotations and combinators, the appropriate system is a
string rewriting system. Such a system is equivalent to a term rewriting system
with a concatenation constructor for programs as the only constructor.
To handle combinators, a quotation constructor has to be introduced as
a second constructor.
The remainder of this paper is organised as follows: The next section introduces rewriting systems in general. Then follows a section on the principal concepts of a rewriting system for Joy. The next two sections give details of rewriting rules for operators and for combinators. Two other sections re-examine the stack and the quotation constructor. It is argued that the stack is not just an optimisation useful in an inmplementation, but that it is almost essential for understanding the semantics of Joy. The possibility of an extensional version of the quotation constructor is discussed but dismissed as unnecessarily restrictive. A final section is an outline of a rewriting system for Joy types; the system resembles a categorial grammar.
Rewriting systems can be classified according to the entities that are being rewritten. In a string rewriting system those entities are linear sequences of symbols or strings of symbols. In a term rewriting system these entities are expressions or terms build from operands and operators. In a graph rewriting system they are graphs of various kinds.
A string rewriting system is based on an alphabet which is just a set of symbols. Strings over a given alphabet are arbitrary sequences of symbols, each taken from the alphabet. The empty sequence or null string is included. A rewriting rule is of the form
x => ywhere both x and y are strings. A string rewriting system consists of an alphabet and a relation =>, a set of such pairs or rules. A wider relation ==> is defined as follows: For strings
w
,
x
, y
and z
, w x z ==> w y zif and only if
x => z
. This relation allows replacement of
x
by y
in arbitrary contexts w..z
.
A term rewriting system also requires an alphabet of symbols. Each symbol has an associated arity which is a natural number (0, 1, 2, ...). Symbols of arity 0 are nullary symbols or operands, symbols of positive arity are unary, binary, ternary and so on symbols or operators. A term over such an alphabet is either an operand or it is an operator of arity $n$ together with $n$ further terms. Terms are really abstract syntax trees, and various notations can be used for their concrete linear representation. A rewriting rule is again a pair of the form
x => ywhere
x
and y
now have to be terms. A term
rewriting system consists of an alphabet of symbols, each with their own arity,
and a set of such rules. A wider relation ==> is defined as
follows: y ==> zif and only if
y
and z
are alike terms except
that y
contains a subterm u
where z
contains a subterm w
such that u => w
.
In any rewriting system it is useful to define ==>> as the
reflexive transitive closure of ==>
.
x ==>> zif and only if
x = z
or for some y
, x
==> y
and y ==>> z
.
Here is an example for a fragment of a rewriting system for arithmetical expressions in Joy:
2 3 + => 5 7 2 3 + * ==> 7 5 * 7 2 3 + * ==>> 35In the second line the subexpression
2 3 +
on the left is
called a reducible expression or redex since it can be reduced using
the rule in the first line.
There are two distinct ways in which a string rewriting system can be
interpreted as a term rewriting system. On the first interpretation, the term
system has exactly one operand. It has as unary constructors all the symbols of
the string system. It has no other constructors. The single operand is just the
null string, and any symbol, say s
from the string alphabet is
interpreted as a unary operator append s
. The appending is
either uniformly on the left or uniformly on the right.
On the second interpretation, the term system has as operands all the symbols from the string system. It has only one binary constructor, concatenation. This interpretation is most useful for a rewriting system for Joy because it is now possible to add the unary quotation constructor which is needed for the combinators.
Rewriting systems can be based just on unconditional rules of the form
x => y
, but they can also have conditional rules. Such
rules state that certain rewritings are permitted provided certain other
rewritings are permitted. The next sections give a conditional rewriting
system for Joy.
A short general introduction to rewriting systems is in \AX{Salomaa}{1985}{Salomaa:85}. String rewriting systems in particular are discussed in \AX{Book}{1985}{Book:85}. A general survey of rewriting systems is in \AX{Schmitt}{1987}{Schmitt:87}. %
This section describes the basis of a conditional rewriting system for Joy using a notation similar to Prolog.
A rewriting system for Joy will be a collection of syntactic rules for rewriting Joy programs. Such a system must be based on the two principal program constructors, program concatenation and program quotation. The system to be presented here uses (unconditional) axioms of the form
P => Qwhere
P
and Q
are programs. There are also
conditional rules of the following forms, where R
, S
,
T
and U
are further programs. P => Q :- R => S. P => Q :- R => S, T => U.The rules are written in a Prolog-like syntax. The turnstyle :- is pronounced "if". On its left is the conclusion or consequent. On the right is a premise or antecedent. The antecedent can be a conjunction, as in the second form above, and the comma , is pronounced "and". So the second of the above rules can be read as: "
P
can be replaced by Q
if R
can be
replaced by S
and T
can be replaced by
U
". Details of these rules are given in the next two sections.
Using the same notation, the relation ==> is defined by the three rules
P ==> Q :- P => Q. P Q ==> P R :- Q ==> R. P R ==> Q R :- P ==> Q.The last two clauses allow rewriting in a context ---
P
on
the left or R
on the right.
The next rules for ==>
concern combinators. In the following,
C
$i$ is any combinator expecting at least $i$ quotation parameters.
[P] C1 ==> [Q] C1 :- P ==> Q. [P] [R] C2 ==> [Q] [R] C2 :- P ==> Q. [P] [S] [R] C3 ==> [Q] [S] [R] C3 :- P ==> Q. [P] [T] [S] [R] C4 ==> [Q] [T] [S] [R] C4 :- P ==> Q.
Note that there is no rewrite rule
[P] ==> [Q] :- P ==> Q.The reason for this is further explained in section 7.
The final relation to be introduced is ==>>, the reflexive
transitive closure of ==>
. It is defined by
P ==>> P. P ==>> R :- P ==> Q, Q ==>> R.
The simplest examples of rewriting axioms are those generated by
definitions. If an atom name
has been defined using
== as program P
, in the form
name == Pthen
name
may be rewritten as P
: name => P
The stack is normally a sequence of values of various types. This sequence is
just a special list which is modified by programs. The first general operator is
newstack, which clears the stack. Clearing twice is the same as
clearing just once. If literals were pushed before the clearing, this has the
same effect as just clearing. So newstack
is in fact the right
zero element for program concatenation.
newstack newstack => newstack. P newstack => newstack.Since the stack is a list, it should be possible to put this list on top of the stack --- that is to say, on top of itself. Also, it should be possible to make the list on top of the stack become the stack. There are two operators that do just that: The stack operator pushes onto the stack a list containing all the elements of the stack. The unstack operator expects a list on top of the stack and makes that the stack. The
unstack
operator undoes what the stack
operator does,
but the reverse is true only in special cases. newstack stack => newstack []. [] unstack => newstack. newstack L => [L] reverse unstack.In the last rule,
L
has to be a list of literals. Also, it
should be noted that the stack is not always a sequence of values, it can also
contain operators and combinators. So, strictly speaking the stack is always a
quotation, and the stack
operator pushes a quotation onto the
stack, and the unstack
operator expects a quotation on the stack
and makes that the new stack.
Although the stack was mentioned in these informal explanations, it should be noted that it is not referred to at all in the rewrite rules. The same will be true in the sections to follow. Rewrite rules are purely syntactic, and the stack is a semantic entity. Joy symbols denote functions from stacks to stacks. But syntax does not concern semantic concepts such as denotation. rem
The unary operators pop and dup are defined on all
stacks containing at least one element. In the rewrite rules to follow, let
a
be any literal or a program whose net effect is to push exactly
one value onto the stack.
a pop => id. a dup => a a.The generalisation that
a
may be not just a literal but can
be a program whose effect is to push a single value is needed for rare cases
like the following: [*] first dup => [*] first [*] first.The two programs on the left and right of the arrow have the net effect of pushing two occurrences of the multiplication operator
*
onto the
stack.
The binary operators swap, popd, and dupd
are defined on all stacks containing at least two elements. Let a
and b
be any literals or equivalent programs.
a b swap => b a. a b popd => b. a b dupd => a a b.The ternary operators swapd, rollup, rolldown and rotate are defined on all stacks containing at least three elements. Let
a
, b
and c
be any literals or equivalent programs. a b c swapd => b a c. a b c rollup => c a b. a b c rolldown => b c a. a b c rotate => c b a.
The ternary operator choice also expects three elements, but the
third element has to be a truth value. Let a
and b
be
any literals or equivalent.
true a b choice => a. false a b choice => b.
The simple types of Joy are the truth value type, the character type and the integer type. The next rules are for the operators on these types.
Rewrite rules for the unary operators succ,
pred, abs and sign for integer operands is
given by the following rules. Since characters are just small positive integers,
the operators can also be applied to characters. The last two operators can also
be applied to truth values. In what follows, let i
, j
and k
be any integers.
i succ => j. ( j = i+1 ) i pred => j. ( j = i-1 ) i abs => j. ( j = abs(i) ) i sign => j. ( j = sign(i) )
Rewrite rules for the binary operators +, -, *, /, rem, max and min for integers operands are as follows.
i j + => k. ( k = i+j ) i j - => k. ( k = i-j ) i j * => k. ( k = i*j ) i j / => k. ( k = i/j ) i j rem => k. ( k = i mod j ) i j max => k. ( k = max(i,j) ) i j min => k. ( k = min(i,j) )Again these binary operators can be applied to characters as well as integers. In the mixed case the type of the result
k
is the same as
the type of the second parameter i
.
Most implementations of Joy will also provide many other arithmetical operations. Since these will be defined in a library, no reduction rules should be given here.
The type of truth values is one of the Boolean types. The binary
operators are and, or and xor (exclusive
or). The unary operator is not. Let p
and
q
be truth values true
or false
.
p q and => r. ( r = p and q ) p q or. => r ( r = p or q ) p q xor. => r ( r = p xor q ) p not => r. ( r = not p )
A predicate is a function which leaves a truth value on the stack. The unary predicates null, small, odd, even, positive and negative are defined for all numeric types:
i null => p. ( p = (i=0) ) i small => p. ( p = (i<2) ) i odd => p. ( p = odd(i) ) i even => p. ( p = even(i) ) i positive => p. ( p = positive(i) ) i negative => p. ( p = negative(i) )
The binary predicates =, !=, <, <=, > and >= have the obvious rewrite rules:
i j = => p. ( p = (i = j) ) i j != => p. ( p = not(i = j) ) i j < => p. ( p = (i < j) ) i j <= => p. ( p = (i <= j) ) i j > => p. ( p = (i > j) ) i j >= => p. ( p = (i >= j) )
The remainder of this section deals with aggregate types: sets, strings and quotations, with lists as a special case. The unary operators first, second, third and rest expect a non-empty aggregate on top of the stack. The following are the rules for list aggregates:
[a L] first => a. [a b L] second => b. [a b c L] third => c. [a L] rest => [L].
Here [a L]
is a non-empty list or quotation whose first member
is a
and whose rest is [L]
. For strings an analogous
notation can be used to obtain analogous rules. For example
"cS" first => 'c. "cS" rest => "S".Here "cS" denotes a non-empty string whose first character is
'c
and whose remaining characters are the string "S"
.
For sets the rules are entirely analogous, except that the numeric ordering of the members is used. One possible notation is the following:
{a S} first => a. {a S} rest => {S}.Here
{a S}
denotes a non-empty set whose smallest member is
a
and whose other members are those of {S}
.
The binary operators cons and swons expect an aggregate and a potential member on top of the stack. These are the rules for list aggregates:
a [L] cons => [a L]. [L] a swons => [a L].The rules for strings and sets are analogous.
The unary operators uncons and unswons also expect a non-empty aggregate. The rules for list aggregates are:
[a L] uncons => a [L]. [a L] unswons => [L] a.So for strings and sets some of the rules are
'c "S" cons => "cS". {S} a swons => {a S}. {a S} uncons => a {S}.
The two binary operators at and of are for indexing into aggregates. For list the rules might be written:
[l1 l2 ... li ... ln] i at => li. i [l1 l2 ... li ... ln] of => li.So the two operators are converses of each other. For both operators in the case of sequences the sequence ordering is used, and for sets the underlying ordering is used. But the notation with the dots
...
is not
satisfactory. Here is a better version for at
applied to lists: [a L] 1 at => a. [a L] n at => b :- [L] (n-1) at => b.And here is a version for
of
applied to sets: 1 {a S} of => a. n {a S} of => b :- (n-1) {S} of => b.
The unary operator size takes an aggregate and determines the number of elements:
[] size => 0. [a L] size => (n+1) :- L size => n.
The unary operator reverse can be applied to any aggregate but it is useful only for sequences:
[] reverse => []. [a L] reverse => [M a] :- L reverse => M.These rules for
reverse
are correct but inefficient since
appending to the right to produce [M a]
requires copying --- at
least for the obvious implementation of lists. Most implementations would use an
accumulating parameter to optimise the reverse
operator.
It is of some interest that this optimisation can be expressed in rewrite rules:
reverse => [] swap shunt. [L] [] shunt => [L]. [L] [a M] shunt => [a L] [M] shunt.
The binary operator concat can be applied to two sequences which are either both lists or both strings.
[] [L] concat => [L]. [a L] [M] concat => [a N] :- [L] [M] concat => [N].
The operators and, or, xor and not can be applied not only to truth values but also to values of the set type. The reduction rules look exactly as for the truth values, except that the operations have to be performed bitwise. So they compute the intersection, union, symmetric difference and complement with respect to the largest set.
The two unary predicates null and small can also be applied to aggregates. These are the rules for lists, those for strings and sets are analogous.
[] null => true. [a L] null => false. [] small => true. [a] small => true. [a b L] small => false.
The two binary predicates in and has test aggregates for members.
a [] in => false. a [a L] in => true. a [b L] in => a [L] inThe has predicate is just the converse:
[] a has => false. [a L] a has => true. [b L] a has => [L] a has.
Most implementations of Joy will provide an operator for sorting a sequence and a binary operator merge for combining two already sorted sequences. Since these will be implemented in a library, no reduction rules are given here. The same applies to many other operators for aggregates. %
Sometimes it is necessary to test a parameter for its type. The unary predicates logical, char, integer, set, string and list are true if the parameter is a truth value, character, integer, set, string or list, respectively. The predicate leaf is true if the parameter is not a list.
false logical => true. 123 logical => false. 123 integer => true. ['A 'B 'C] leaf => false.
There is another operator for multi-choices. It expects a non-empty list of
non-empty lists on top of the stack and below that one further item. The
opcase operator matches the type of the item with the
first
members of the lists. When a match is found, the
rest
of that list is pushed onto the stack. If no match is found,
then the last list is used as the default.
123 [ [0 P] ['a Q] ["" R] ... ] opcase => 123 [P]. 'c [ [0 P] ['a Q] ["" R] ... ] opcase => 'c [Q]. "Hello" [ [0 P] ['a Q] ["" R] ... ] opcase => "Hello" [R].
The simplest unary combinators are i and x,
they require the top of the stack to be one quotation. Let
P
be any program.
[P] i => P. [P] x => [P] P.
The next unary combinators, dip, dip2 and
dip3, allow manipulation of the stack below the top few elements. Let
P
be any program, let a
, b
and
c
be any literals or equivalent.
a [P] dip => P a. a b [P] dip2 => P a b. a b c [P] dip3 => P a b c.
Another unary combinator is nullary. Its rewrite rule has to be
expressed conditionally. Let L
, M
and P
be any programs.
L [P] nullary => L a :- L P => M a.Three similar unary combinators are unary, binary and ternary:
L b [P] unary => L a :- L b P => M a. L b c [P] binary => L a :- L b c P => M a. L b c d [P] ternary => L a :- L b c d P => M a.
Three further unary combinators are app1, app2 and
app3. Let a
, a'
, b
,
b'
, c
and c'
be any literals or
equivalent. Note that the primed versions are used as the result of applying
P
to the unprimed versions.
L a [P] app1 => L a' :- L a P => M a'. L a b [P] app2 => L a' b' :- L a P => M a', L b P => N b' L a b c [P] app3 => L a' b' c' :- L a P => M a', L b P => N b', L c P => O c'.There is even an app4 combinator which applies
[P]
to four parameters a
, b
, c
and
d
.
The binary combinators expect two quotations on top of the stack.
The b combinator expects two quotations [P]
and
[Q]
, with [Q]
on top.
[P] [Q] b => P Q.The cleave combinator also expects two quotations, and below that an item
a
. L a [P] [Q] cleave => L b c :- L a P => M b, L a Q => N c.
The ternary combinators expect three quotations on top of the stack. One of the most important is ifte which performs branching. Its third parameter is the if-part, its second parameter is the then-part, its first parameter, on top, is the else-part.
L [I] [T] [E] ifte => T :- L I => M true. L [I] [T] [E] ifte => E :- L I => M false.
The binary whiledo combinator is similar to the ifte
combinator in that it has a test, the while-part, which is second on the stack.
The combinator repeatedly executes the while-part and while that yields
true
it executes the other part, the do-part.
L [W] [D] whiledo => L :- L W => M false. L [W] [D] whiledo => L D [W] [D] whiledo :- L W => M true.The ternary tailrec combinator for tail recursion also has a test, the third parameter. If that yields true, the second parameter is executed and the combinator exits, otherwise the top parameter is executed and after that the process is repeated.
L [I] [T] [R] tailrec => L T :- L I => M true. L [I] [T] [R] tailrec => L R [I] [T] [R] tailrec :- L I => M false.
The quaternary combinators expect four quotations on top of the
stack. The linrec combinator for linear recursion expects an
if-part [I]
, a then-part [T]
, and two recursion parts
[R1]
and [R2]
.
L [I] [T] [R1] [R2] linrec => L T :- L I => M true. L [I] [T] [R1] [R2] linrec => L R1 [I] [T] [R1] [R2] linrec R2 :- L I => M false.
The binrec combinator for binary recursion is similar, except that the first recursion part has to produce two values. The recursion with all four parts is applied to the two values separately. The second recursion part then has available the two results from these two applications.
L [I] [T] [R1] [R2] binrec => L T :- L I => M true. L [I] [T] [R1] [R2] binrec => L a b R2 :- L T => M false, L R1 [I] [T] [R1] [R2] binrec => N a b.
The genrec combinator for general recursion is also has an if-part, a then-part and two recursion parts. It differs from the other two combinators in that after the execution of the first recursion part nothing in particular is executed, but a program consisting of the four parts and the combinator is pushed onto the stack. The second recursion part thus has it available as a parameter.
L [I] [T] [R1] [R2] genrec => L T :- L I => M true. L [I] [T] [R1] [R2] genrec => L R1 [[I] [T] [R1] [R2] genrec] R2 :- L I => M false.
There are several combinators which do not have a fixed number of quotation
parameters. Instead they use a list of quotations. The cond
combinator is like the one in Lisp, it is a generalisation of the
ifte
combinator. It expects a non-empty list of programs, each
consisting of a quoted if-part followed by a then-part. The various if-parts are
executed until one is found that returns true
, and then its
corresponding then-part is executed. The last program in the list is the default
which is executed if none of the if-parts yield true
.
L [ [[I1] T1] REST ] cond => L T1 :- L I1 => M true. L [ [[I1] T1] REST ] cond => L [ REST ] cond :- L I1 => M false.
The condlinrec combinator is similar, it expects a list of pairs
or triples of quoted programs. Pairs consist of an if-part and a then1-part, and
triples consist of an if-part, a rec1-part and a rec2-part. Again the first
if-part that yields true
selects its corresponding then-part or
rec1-part for execution. If there is a rec2-part, the combinator first recurses
and then executes the rec2-part. The last program is the default, it does not
have an if-part.
The cleave
combinator also has a generalisation, The
construct combinator expects two parameters, a quotation and above
that a list of quotations. Each quotation in the list will produce a value that
will eventually be pushed onto the stack, and the first quotation determines the
stack onto which these values will be pushed.
L [P] [..[Qi]..] construct => L P ..qi.. :- L Qi => M qi.
Some combinators expect values of specific types below their quotation parameters. The next few combinators expect values of simple types.
The binary combinator branch expects a truth value below its two
quotation parameters: The branch combinator resembles the
choice
operator and the ifte
combinator. The truth
value below the two quotations determines which of the two quotations will be
executed. If the truth value is true
, then the if-part, the second
parameter, is executed, otherwise the then-part, the top parameter, is executed.
true [P] [Q] branch => P. false [P] [Q] branch => Q.
The unary combinator times expects a numeric value below its
quotation parameter: The times
combinator executes its quotation
parameter as many times as indicated by the numeric value; if the value is zero
or less, then the quotation is not executed at all.
0 [P] times => id. n [P] times => P (n-1) [P] times.%
The stack is normally a list, so any list could serve as the stack, including
a list which happens to be on top of the stack. But the stack can also contain
operators and combinators, although this does not happen often. So the stack is
always a quotation, and any other quotation could serve as the stack, including
one on top of the stack. The infra combinator expects a quotation
[P]
which will be executed and below that another quotation which
normally will be just a list [M]
. The infra
combinator
temporarily discards the remainder of the stack and takes the quotation or list
[M]
to be the stack. It then executes the top quotation
[P]
which yields a result stack. This resulting stack is then
pushed as a list [N]
onto the original stack replacing the original
quotation or list. Hence any quotation can serve as a complex unary operation on
other quotations or lists.
L [M] [P] infra => L [N] :- [M] unstack P => N.
For linear recursion over numeric types the if-part often is
[null]
and the first recursion part is [dup pred]
. The
primrec combinator has this built in. For integers the rewrite rules
are:
0 [T] [R2] primrec => pop T. i [T] [R2] primrec => i dup pred [T] [R2] primrec R2.The
primrec
combinator can also be used for aggregates. The
implicit if-part is again [null]
, and the implicit first recursion
part is [rest]
. Below is the version for lists, the versions for
sets and strings are analogous. [] [T] [R2] primrec => pop T. [a L] [T] [R2] primrec => a [L] [T] [R2] primrec R2.
The unary combinators step, map, filter and split all expect an aggregate below their quotation parameter.
For step operating on lists the rewrite rule is:
[] [P] step => id. K [a L] [P] step => M [L] [P] step :- K a P => M.For strings and sets the rules are analogous. The same is true of the rules to follow. For map operating on lists the rewrite rule is:
[] [P] map => []. K [a L] [P] map => K [b M] :- K a P => K b, K [L] [P] map => K M.The filter combinator expects a predicate as its quotation parameter.
[] [P] filter => []. K [a L] [P] filter => K [a M] :- K a P => J true, K [L] [P] filter => K [M]. K [a L] [P] filter => K [M] :- K a P => J false, K [L] [P] filter => K [M].The split combinator is like
filter
except that it
produces two lists. The first list is just like the one from
filter
, the second list is the list of those elements which did not
pass the predicate test [P]
and hence are not members of the first
list. [] [P] split => []. K [a L] [P] split => K [a M] [N] :- K a P => J true, K [L] [P] split => K [M] [N]. K [a L] [P] split => K [M] [a N] :- K a P => J false, K [L] [P] split => K [M] [N].
The unary combinator fold expects a quotation which computes a binary operation. Below that has to be a literal and below that an aggregate. The literal is used as a start value to fold or reduce the aggregate. Applied to lists the combinator has these rules:
[] a [P] fold => a. [b L] a [P] fold => d :- a b P => c, [L] a [P] fold c P => d.
The two unary combinators some and all expect
an aggregate below their quotation parameter. The quotation must be a predicate,
yielding a truth value. The some combinator returns true
if some members of the aggregate pass the test of the quotation, otherwise it
returns false
. The all combinator returns
true
if all members of the aggregate pass the test of the
quotation, otherwise it returns false
. For empty aggregates
some
returns false
and all
returns
true
. The rules for some
are:
[] [P] some => false. L [a A] [P] some => L true :- L a P => M true. L [a A] [P] some => L [A] [P] some :- L a P => M false.The rules for
all
are: [] [P] all => true. L [a A] [P] all => L false :- L a P => M false. L [a A] [P] all => L [A] [P] all :- L a P => M true.
The unary combinator zipwith expects two aggregates and above that a program suitable for combining their respective elements. For lists the rules are
[] [A] [P] zipwith => []. [A] [] [P] zipwith => []. L [a A] [b B] [P] zipwith => L [c C] :- L a b P => M c, L [A] [B] [P] zipwith => L [C].%
This section deals with the role of the Joy stack from a syntactic and semantic point of view.
First, let us consider a quite small arithmetic expression in postfix notation:
2 3 + 8 5 - *A reduction might begin by doing the addition first, or the subtraction first, followed in a second step by the other operation. In fact, the addition and the subtraction could be done in parallel in the same step. Only when both reductions have been done will it be possible to do the final multiplication. The final result is the value
20
, and it is independent of the
order in which the reductions have been applied. In detail, the first mentioned
reduction sequence will look like this: 2 3 + 8 5 - * 5 8 5 - * 5 3 * 15One possible strategy for reductions is the following:
Scan the expression from left to right until a redex is found, an expression that can be replaced in accordance with a rewrite rule. Apply the rule. Repeat until no more rules can be applied.
This strategy is most efficient for reducing expressions in which redexes are found early. The following is an example. Again all operators are binary, but note that except at the beginning operators and literals alternate. In each step the first three symbols constitute a redex.
10 5 / 3 * 4 - 1 +The strategy is least efficient when a redex is found late. In the example below, note that all operators occur towards the end.
3 2 6 8 6 - / + *The strategy requires skipping the
3
, 2
and
6
and only then replacing 8 6 -
by 2
. The
next step requires skipping 3
and 2
and only then
replacing 6 2 /
by 3
. The next step requires skipping
3
and only then replacing 2 3 +
by 5
. The
final step requires no skipping, 3 5 *
is replaced by
15
. All this skipping is of course inefficient.
A better strategy would apply the next operator at the point of the most recent change, if that is possible. An obvious way to do this is to use a stack of values for intermediate results. As the expression is being processed, operands such as literal numbers are pushed, and operators pop their arguments off the stack and push their result. This is of course the method commonly used for evaluating postfix expressions. So we have the following situation: The rewriting rules for programs are purely syntactic, they do not mention the stack. But the stack can be used as an optimisation of the rewrite rules. On the other hand, the stack is apparently an essential semantic entity, it is the argument and value of the functions denoted by programs.
But this now raises the question whether the stack is just an optimisation for the rewriting system or whether it is really needed as a semantic object. In other words, is it possible to give a semantic characterisation of Joy which does not involve a stack at all? In such a semantics the programs will have to denote something, and presumably they will have to denote functions. But what might be the arguments and values of these functions?
It will help to review the stack based semantics of Joy: The literals such as numerals, characters, strings and quotations denote functions taking any stack as argument and producing another stack as value which is like the argument stack except that a single item has been pushed on top. The operators also denote unary functions from stacks to stacks, and the result stack is like the argument stack except that the top few items have been replaced by the result of applying some operation. Likewise, the combinators denote unary functions from stacks to stacks, and the result stack depends on the combinator and the top few quotations.
To obtain a Joy semantics without a stack we take our hint from the rewriting rules. The operators and combinators no longer denote functions from stacks to stacks. The rewrite rule for addition transforms a program ending with two numerals into a program ending with a numeral for their sum. This is the key for a semantics without a stack: Joy programs denote unary functions taking one program as arguments and giving one program as value. The literals denote append operations; the program returned as value is like the program given as argument, except that it has the literal appended to it. The operators denote replacement operations, the last few items in the argument program have to be replaced by the result of applying the operator. Similarly the combinators also denote (higher order) functions from programs to programs, the result program depends on the combinator and the last few quotations of the argument program.
It is clear that such a semantics without a stack is possible and that it is merely a rephrasing of the semantics with a stack. Purists would probably prefer a system with such a lean ontology in which there are essentially just programs operating on other programs. But most programmers are so familiar with stacks that it seems more helpful to give a semantics with a stack. Its is of course irrelevant for the semantics that for efficiency reasons any implementation of Joy will in fact use a stack.
There is one other argument for a stack semantics. By a program one would normally mean one that can be run, at least when supplied with appropriate parameters. The stack, however, can sometimes contain sequences of items that make the stack a non-executable program because it violates type rules. Such situations arise for example by executing one of the following:
[ 3 * ] second [ pop cons map ] [] stepThe first results in the one operator
*
being pushed. The
second results in two operators and one combinator to be pushed. Such situations
are required only rarely. But the possibility is needed, for example for a Joy
interpreter joy written in Joy itself. Such an interpreter is
described in another paper.
It was mention in section 3 that for quotations there is no rewrite rule of the form
[P] ==> [Q] :- P ==> Q.If there were such a rule, then the rewriting
42 dup ==> 42 42.would license
[ 42 dup ] ==> [ 42 42 ].and hence
[ 42 dup ] second ==>> [ 42 42 ] second. dup ==>> 42.which is absurd. On the other hand,
[ 42 dup ] i + ==> [ 42 42 ] i +is acceptable. So, quotations must not allow substitutions in all contexts, but only in those where the quotation is guaranteed to be undone by a dequoting operation, by a combinator. In other words, quotation is an intensional constructor.
There is a simple way out of this, and it is to treat quotations of programs
to be very different from lists. Notice that the absurdity comes from taking the
second
element of the quotations [42 dup]
and
[42 42]
. If it were forbidden to treat quoted programs as data
structures, then the fatal inference would be blocked. In detail, such a
treatment would look like this: If P
is a program, then
(P)
is its quotation, now written inside round parentheses. Also,
[P]
is its list, as before written inside square brackets. Both
(P)
and [P]
can be pushed onto the stack, can be
swap
ped, dup
licated and pop
ped, can be
inserted into lists and later extracted. But only (P)
can be used
as a parameter for combinators, and it cannot treated as a list. On the other
hand, [P]
cannot be used as a parameter for combinators, but it can
be treated as a list. Importantly, there could then be a reduction rule
(P) ==> (Q) :- P ==> Q.and hence quotation would be an extensional constructor.
This is a draconian solution, it allows programs such as
[2] cons reversebut forbids
(+) cons mapThe latter uses
map
to add a single number on top of the
stack to each member of a list that is second on the stack. If the single number
on top of the stack is, say 7
, then the cons
operation
produces (7 +)
to be used by map
. The prohibition
would rule out parameterisation. In general, the prohibition would rule
out using constructed programs as parameters to combinators.
It is possible to make a less drastic compromise: As before, quotations
(P)
serve as parameters to combinators, but they can also be built
up by list operations such as concat
enation or cons
ing
further items into their front. This would allow parameterisation as in the
map
example above. Quotations could be constructed and built up
further and further and eventually called by a combinator, but quotations could
not be destructed. On this proposal constructive operations on quotations would
be allowed, but destructive operations would not. All list operations would need
to be classified as constructive or destructive. Even the size
operator would turn out to be destructive.
This compromise solution has much in its favour. Quotation is extensional, combinators can use constructed programs, but the absurdity does not arise. On the other hand, the compromise requires a syntactic distinction between quotations and lists, and it requires a semantic distinction between operations that can be applied to lists and to quotations, and those that can be applied only to lists.
On the whole, then, it does seem preferable to have quotation as an intensional constructor.
The rewriting system described up to here concerned values. It is also
possible to give a rewriting system for Joy type expressions. We shall
need constant and variable symbols for these types. The following will be used
as type constants: Log
for the truth values,
Chr
for the characters, Int
for the integers,
Set
for the sets, Str
for the strings and
Lst
for possibly heterogeneous lists. For lists whose member are
all of the same type, say Int
, the notation [Int]
will
be used. As variables we use T
, T1
, T2
and so on. So [T]
is the type of lists whose members are all of the
type T
.
The following is a sample of one style of rules for operators:
T1 T2 swap => T2 T1. Int Int + => Int. Str size => Int.The notation used above has been made as close as possible to the notation for the rewriting rules for values. In the following a different notation will be introduced which is more useful.
Literals have atomic types, operators and combinators have
compound types. There are three constructors for compound types:
type concatenation, type quotation and type
cancellation. The first two are derived from the corresponding program
constructors. The third is new and has no counterpart program constructor. It
uses the infix symbol -> to combine two types into a new one. If
T
is a type, then so is its quotation [T]
. If
T1
and T2
are types, then so are their concatenation
(T1 T2)
and their cancellation T1->T2
. If P1
and P2
are programs of types T1
and T2
, then the type of their concatenation (P1 P2)
is the
concatenation (T1 T2)
of their types. Cancellation satisfies the
law
T1 T1->T2 => T2For types with concatenated parameter types:
(T1 T2)->T3 => T1->(T2->T3)The expression on the right of the arrow can be written without parentheses on the convention that the cancellation operator
\
is
taken to be right associative.
The three rules given above should now be rewritten in this style:
swap => (T1 T2) -> (T2 T1). + => (Int Int) -> Int. size => Str -> Int.
The following are a sample of further rules in the two styles: Those in the left column are in the earlier style, those in the right are in the new style.
Chr succ => Chr. succ => Chr->Chr. Int Int > => Log. > => (Int Int)->Log. Log Log and => Log. and => (Log Log)->Log. [T Lst] first => T. first => [T Lst]->T. [T Lst] rest => Lst. rest => [T Lst]->Lst. T Lst cons => [T Lst]. cons => (T Lst)->[T Lst]. [T Lst] uncons => T Lst. uncons => [T Lst]->(T Lst). Chr Str cons => Str. cons => (Chr Str)->Str. Set null => Log. null => Set->Log.Consider now the program
P
below. Its type is given by the
concatenation of the types of its parts, in the line just below. All the types
here are built from the atomic type Int
of integers. By four
applications of cancellation the type in line 1 is simplified to the type
Int
in line 4. P: 2 3 + dup * 1. Int Int (Int Int)->Int Int->(Int Int) (Int Int)->Int 2. Int Int->(Int Int) (Int Int)->Int 3. (Int Int) (Int Int)->Int 4. Int
For combinators only only a few examples will be given here, for i and map.
i => [T] -> T. map => ([T1 -> T2]) -> ([T1] -> [T2]).
The formalism used in this section is that of categorial grammars. These have their origin in the (simple) theory of types and as generating devices are as powerful as context free grammars. Expositions and applications are to be found in the volume edited by \AX{Oehrle {\it et al}}{1988}{Oehrle-etal:88}, see in particular the contributions by \X{Casadio} and \X{Lambek}. Another reference is the volume edited by \AX{Buskzkowski {\it et al}}{1988}{Buszkowski-etal:88}.
Rewriting systems are purely syntactic. If the object language has a semantics, then the rewriting rules have to be shown to be correct with reepsect to this semantics. This is true of the rewriting rules of the previous sections which dealt with values. It is also true for the rewriting rules for types. The basic semantic notion here is that of assigning types to programs. These take the form
P : Twhich is not a rewriting rule but a statement which says that program
P
is of type T
. The basic type statements
are to atomic programs, literals, operators and combinators. Here are some
examples: 42 : Int 'A : Chr succ : Chr -> Chr > : Int Int -> Log first : [T Lst] -> T i : [T] -> TThis is the style adopted in the Joy manual. To obtain rewrite rules using => a single conditional rule is needed which converts the semantic predicate : into the syntactic
=>
, as follows. X => T :- X : T.
The material in this section has very tentative, most of the details need to be worked out fully.