The data structures that have been considered so far - the `record' structure of data fields in a class definition, 1-dimensional arrays of primitive types and Objects, and both constrained and unconstrained multi-dimensional arrays - all share one feature: once a given instance has been defined and fully instantiated (whether upon declaration as with a constrained array or in the context of values supplied when the program is executed as in the case of unconstrained arrays) it is not possible `easily' to change the number of elements that can be held in the current instance of a structure.
The question of why one should want to be able to do this might be raised, but to this question one could respond with the following natural scenarios:
Notice that the last of these provides a unique identifying attribute (or key) by which the data concerning any individual student can be referred to.
What problems can be identified concerning handling such applications in the the context of the data structures met so far?
We could cite many similar examples within the areas of Database Design, Systems Software (e.g. compiler organisation), Real-time systems, Simulation and Modelling applications. A more extensive treatment of database technology will be given in the next part of this module.
For the purposes of the current lecture, we are not so much concerned with particular technical specifics of these fields, but rather with identifying the common aspects of the scenarios outlined above, why these may be problematic, and why those data structures considered up to now, are wanting in their ability properly to address such problems.
The critical feature that is common to the two examples we have selected above may be summarised in the following words
The number of elements that is (minimally) required to record all the relevant data
Thus,
Question: Why are these behaviours problematic?
Answer: Because in defining a suitable data structure to represent such information we do not know exactly what size it is reasonable/realistic to choose for it.
Thus if we look at `obvious' solutions in terms of the structures introduced already,
There is one detail we have left unspecified in outlining the `naive' solutions above:
What size (how many elements) should the array of Instances be constrained to have? |
At first sight these may seem like difficulties that could be surmounted by making a careful analysis of the application and judiciously over-estimating the number of records that might be required, (`judiciously' in the sense of allowing, perhaps, 20% extra instead of 2000%). Returning to the Student Record example provides a good illustration of just how difficult (if not, arguably, impossible) making such `judicious' estimates might prove. As will become apparent later, the design and realisation of a substantial database system (as a student record system is) involves a considerable investment of time and analysis. In view of this, it is desirable for such systems to have a lifetime of several years (many present day large-scale database systems such as the DVLC and PNC have been in existence for almost 20 years). A student record database for the University of Liverpool (or the Dept. of Computer Science thereof) in 1985 would have been required to hold records on of the order of 7,500 students (roughly 120 in Computer Science); in 1999 the figure is nearer 20,0000 (for the whole University) (with 400-420 for Computer Science). It is extremely unlikely that a `careful analysis' in 1985 would have produced figures exceeding 12,000 (resp. 250) as the maxima likely to be needed: census statistics indicated a significant fall in the number of 16-20 year olds for the early 1990s a group from which the vast majority of university students were then drawn; the Government policies on H.E. expansion had not been mooted; locally within the Computer Science Department in 1985, the G520 (Computer Information Systems B.Sc) had not been thought of, and the present M.Sc accounted for around 15-20 students (instead of the 40-50 of recent years).
Dealing with deciding the size of an array is only one of the drawbacks in using a `fixed' length array structure to store such records. One also has to contend with problem of `freeing' up elements of such an array whose records are no longer active, e.g. a student who has graduated, a process which has completed. In this setting one has problems such as:
In summary, although it is `technically' possible to solve the applications problems mentioned using the `obvious' solution of a `large enough' array of record Instances, proceeding in such a manner creates a large number of (avoidable) difficulties, since:
There are, however, data structures which can circumvent the problems outlined above. The important characteristic shared by these is the property of being dynamically adaptable. Thus, they provide a clean mechanism for
Within Java the `garbage collection' process (that is a permanently running `background' thread), ensures that maintaining information on free/allocated memory space is handled by the Java run-time environment, i.e. the application developer or users of a program do not have to be concerned with this.
The ADT known as a Linked List is the simplest example of a data structure providing the capability of `dynamically adapting' to accommodate changes in the quantity of information stored. This structure also provides a basis for building more complex structures qualified by the precise regime in which applications wish to access individual records. It is important to be aware that in their simplest avatars, linked list structures are far from offering a panacea for the problems that were identified earlier. This is especially true when a very large collection (ca. 106+) of records is involved, and so in a typical modern database setting, more sophisticated structures are needed. Linked lists and their basic variations do provide reasonable solutions in a number of systems' programming contexts, e.g. the management of processes in O/S environments; hardware modelling in simulation tools.
The figure below gives an informal depiction of linked list structure.
Figure 5.1:Linked List Structure
In Figure 5.1, the box labelled `List Handle' indicates the name of the
list. This contains a reference (or pointer) to the start of the list.
Each list cell comprises two parts: the first (labelled `Datum' in the
diagram) contains the information held in the list element: this could be anything from
a simple value over a primitive type to an arbitrary Object (which could
itself be a linked list, but we will defer consideration of more complex structures such as this
until later). The second element of a list cell (labelled `Link' in Figure 5.1)
is a pointer to the remainder of the list. In the (currently) final
list cell, this pointer is set to the null reference.
The mechanism here is one that you have already encountered in the sense of a program's
control flow regime: a linked list is a structure that is naturally specified with
a recursive, i.e. self-referential, definition, cf. the use of
methods with recursive defintions that you saw in
COMP101.
Before proceeding to examine how this structure can be implemented as a Java class, we make some
general observations about its organisation.
Having presented an informal schematic description of how a linked list operates,
a more formal definition of its abstract structure can be given.
Definition:
A Linked List data structure, L, is recursively defined as:
|
Considering both the informal pictorial description and more formal definition just presented, we see that a structure implementing these arrangements requires 2 fields:
Of course the first of these can simply be defined (within the class) to be an arbitrary Object. What about the second? If the name of the class representing a single element (i.e. (datum, link) pair) is, say, ListCell, then, in keeping with the recursive definition of a linked list structure, this field is either a null reference OR an instance of another ListCell.
In other words, the type associated with the field Link of an Object of type ListCell is again ListCell, i.e.
public class ListCell { private Object Datum; private ListCell Link; } |
Figure 5.2: The fields of a List Cell Class for Building Linked Lists.
Notice that this definition is recursive, the field Link is defined to be an instance of a ListCell. Of course, this does NOT mean that an `infinite' amount of storage space is required: only the name of the instance has been defined, but since no instantiation has been performed, were we to declare an Object of type ListCell it would consist of the null value.
In order to give a simple illustration, let us consider how this structure changes in response to an application which stores a collection of user-supplied names in a list.
First we need a Constructor for a ListCell. Since such a cell comprises a pair (Object, ListCell) pair we can think of such a constructor taking a pair of such values as its parameters and setting the Datum field to the Object supplied and the Link field to the ListCell reference, i.e.
Figure 5.3 Constructor for Instantiating a List Cell
Putting the content of Figures 5.2 and 5.3 together we get a minimal realisation of a
ListCell Object,
Figure 5.4: A Very Basic List Class
The content of Figure 5.4. is, of course, some way away from an ADT definition providing
a reasonable level of functionality, cf. the methods outlined earlier.
It will suffice, however, for our immediate descriptive purposes.
Figure 5.5 below, presents a short Java program, using the ListCell
class. The application reads successive lines of text presented by the user and
creates a linked list that stores these. An empty input line, i.e. typing < return >, in
reposnse to the input prompt is used to signal the end of the input.
Figure 5.5: Constructing and Printing a Linked List Using the ListCell Class
Before examining how the Instance Method PrintOut(), in the
ListCell class is realised, we first illustrate how the
application in Figure 5.5, behaves with an example input set.
After the declaration of the ListCell Object NamesInList,
this contains just the null reference, i.e. the list is empty, as
depicted in Figure 5.6(a).
Figure 5.6(a): The NamesInList Object After Declaration.
The prompt for input is issued, let us suppose this receives the response KILMARNOCK
When this occurs, the NamesInList instance of ListCell
is instantiated by calling the ListCell Constructor using
Note that we have replaced the String TextReadIn with the
String literal that it contains when the constructor is invoked.
Inspecting the Constructor definition, it is seen that the
String "KILMARNOCK" is stored in the Datum field.
What about the Link field? The effect of specifying the
NamesInList object as the parameter passed to the Constructor, is that the
current value of Link (i.e. before the constructor call) is supplied as the
appropriate `list handle' for the new ListCell. This
value is, of course, the null reference. Thus after the Constructor
has completed, the NamesInList List Handle will refer to a (non-empty) List with
a single List Cell, the Datum value of which is "KILMARNOCK" and whose Link field
is the null reference, as shown in Figure 5.6(b).
Figure 5.6(b): NamesInList After First Datum Read In.
Moving to the second item of input, suppose that this receives the response,
Inside the main loop of the program in Figure 5.5., the ListCell
Constructor is now invoked as,
In the same way as before, the NamesInList parameter is the current value of
this reference, i.e. the list structure in Figure 5.6(b). Thus the
new ListCell created for NamesInList
has a Datum field with value "CELTIC", and a Link field whose value
is the `earlier' value of NamesInList. So the configuration of NamesInList
is now as shown in Figure 5.6(c),
Figure 5.6(c): NamesInList After Second Datum Read In.
Continuing in this way, after a third response to the prompt,
The `effective' Constructor call to instantiate NamesInList is
Here we have expanded the parameter passed as the value for the Link field using the
notational convention for a list introduced in the formal definition
earlier. The instance of ListCell is now as Figure 5.6(d).
Figure 5.6(d): NamesInList After Third Datum Read In.
In the same way, after 2 further (non-empty) input lines, e.g.
The NamesInList will first become as in Figure 5.6(e),
Figure 5.6(e): NamesInList After Fourth Datum Read In.
and then reach its final state, shown in Figure 5.6(f).
Figure 5.6(f): Final State of NamesInList After Fifth Datum and Empty Line Read In.
It should be clear from the example given that in order to push
a new item onto the the start of an existing List (whose list handle, say, is ListName)
all that is needed is
As we can see from the example illustrated, successive invocations of this form, result
in the list handle (ListName) after kk'th item added (in its Datum field) and the list handle for the object
formed after k-1 insertions (as its Link field).
In the particular construction regime that has been illustrated, the ordering
of items within the list, is the reverse of the order in which they were added - the first
item read is that in the `last' ListCell (the one whose Link
field is the null reference). The last item added, is at the `head'
of the list, i.e. is the Datum field of the final ListCell
Instance.
To see that this is indeed the case, we now present the Instance method in the
ListCell class, used to print out all of the Datum fields
present in a list. This method is shown in Figure 5.7.
Figure 5.7: Instance Method to Print a List in the Class ListCell
The operation of this method is very straightforward. It uses a local variable, temporary
which is of type ListCell and instantiated to the instance
of ListCell associated with this method. The method iterates
until the variable temporary has `reached' the end of the relevant List, i.e. the
null reference (Link),
at each stage printing out the current Datum field
and then resetting temporary to the current Link value.
When this method is called using the example application and the data supplied as
input, the output below is generated.
public ListCell(Object head, ListCell next_cell)
{
Datum = head; Link=next_cell;
}
public class ListCell
{
private Object Datum;
private ListCell Link;
public ListCell(Object head, ListCell next_cell)
{
Datum = head; Link=next_cell;
}
//
// A Method to Print the List will come here
//
}
//
// COMP102
// Example 8: Construction and Output of A Linked List
//
// Paul E. Dunne 5/11/99
//
import ListCell; // The ListCell Class
import java.io.*;
public class SimpleListExample
{
public static InputStreamReader input = new InputStreamReader(System.in);
public static BufferedReader keyboardInput = new BufferedReader(input);
static ListCell NamesInList;
static String TextReadIn = new String();
//
public static void main( String[] args) throws IOException
{
System.out.print("Next List Item (hit
Next List Item (hit < return > to finish):KILMARNOCK
NamesInList = new ListCell ("KILMARNOCK",NamesInList);
Next List Item (hit < return > to finish):CELTIC
NamesInList = new ListCell ("CELTIC",NamesInList);
Next List Item (hit < return > to finish):MOTHERWELL
NamesInList = new ListCell ("MOTHERWELL","CELTIC"::"KILMARNOCK"::null);
Next List Item (hit < return > to finish):DUNDEE
Next List Item (hit < return > to finish):HIBERNIAN
Next List Item (hit < return > to finish):
3.1. Some General Observations
ListName = new ListCell ( DatumValue, ListName );
public void PrintOut()
{
ListCell temporary = this;
while (!(temporary==null))
{
System.out.println(temporary.Datum);
temporary = temporary.Link;
};
}
HIBERNIAN
DUNDEE
MOTHERWELL
CELTIC
KILMARNOCK
4. Summary
(Notes prepared and maintained by Paul E. Dunne, November 1999)