INTRODUCTION TO PROGRAMMING IN JAVA: DIRECTORIES AND PROGRAMMING LANGUAGES

NOTE: This set of www pages is not the set of www pages for the curent version of COMP101. The pages are from a previous version that, at the request of students, I have kept on line.

1. Directories
2. Computer programming languages.

3. Compilers and interpreters

1. DIRECTORIES

After you have used your system for a while you will start accumulating a large collection of files. With the help of directories you can organise your files into manageable, logically related groups. For example, if you have several different projects, you can create a directory for each project and store all the files associated with each project in the appropriate directory.

Directories are hierarchically organised; i.e. each directory has a parent directory "above" it, and may also have child directories "below" it, which may in turn have further child directories, and so on. The top most directory is referred to as the root directory. The directory structure may be likened to chambers in an "Ancient Egyption" pyramid (Figure 1).

Figure 1: Pyramid conceptualisation of a directory structure.

A comparison between the pyramid conceptualisation and directory structure is presented in Table 1.

Chambers	Directories
Contain artifacts	Contain files
Contains an entrance from above	Has a parent directory
May contain passage ways to chambers below	May have subordinate child directories
Pyramid has one entrance at the top	Directory structure has a root directory at the top

Table 1: Comparison between the pyramid conceptualisation and directory structure.

The directory structure that maps onto the rooms and artifacts in the above pyramid is presented in Figure 2 (ovals represent directories and boxes files).

Figure 2: Example directory structure.

In Figure 2 all directories fall under the topmost "root" directory which is denoted by a forward slash (/). When you log in, the operating system places you in what is called your home directory. This is the top most directory in your file store. With respect to a distributed computer system (such as that we will be using) your home directory is usually a child of another directory (often called users) which is itself a child of the top most "system" directory called the root directory (as in Figure 2). The directory you happen to be in at any time (which may or may not be your home directory) is referred to as your current directory. Figure 3 shows the distributed style of organisation where the home directories for three users frans,

martyn and kate are sub-directories of the parent directory students, which in turn is a sub-directory of users etc.

When specifying files in your current directory you can refer to them by their names. However, when referring to files (or directories) outside your current directory you must use path names. A path name specifies where a particular file or directory is located within the directory structure. There are two kinds of path names absolute and relative. Absolute path names specify the path to a file or directory starting from the root directory. Figure 3 shows the absolute path names for various files and directories (Note that a forward slash (/) is used in UNIX/LINUX and a backward slash (\) in Windows 2000).

Figure 3: Absolute path names.

Relative path names specify directories and files starting from your current directory. The relative path name for your current directory is one dot (.). The relative path name for the parent directory of your current directory is two dots (..). Figure 4 shows relative path names for various directories and files starting from the current directory /users/students/frans.

Figure 4: Relative path names.

The above illustration shows how a user (once they have successfully logged in) can access other users' directories. Clearly most users would not wish this and the HP-UX system supports mechanisms to prevent this. Another user can only access one of your directories if you specifically allow them to do sp (there are commands for this).

The system administrator, of necessity, has access to all files and directories!

Most operating system providea program called a file manager which has a graphical interface which allows users to "point and click" to change directories and select files; consequently much of the foregoing concerning relative and absolute paths can be dispensed with.

2. COMPUTER PROGRAMMING LANGUAGES

All programming languages are artificial. Each has a limited vocabulary, an explicitly defined grammar, and well-formed rules of syntax and semantics. We can identify three types of programming language:

Machine languages
Assembly languages
High-level languages

2.1. Machine languages (machine code)

We have seen that a program instruction comprises a particular combination of binary digits. Different makes and types of computer use different "codes" of binary digits to represent instructions. Such codes are referred to as machine or instruction codes. At this level a computer has a basic repertoire of instructions that it can perform, known as the instruction set. Typically this instruction set includes:

Basic arithmetic operations
Comparators of various kinds (e.g. equality operators and so on)
Facilities to deal with sequences of characters
Input/Output (IO) operators

In the early days of computer programming all programs had to be written in machine code. For example a short (3 instruction) program might look like this:

0111 0001 0000 1111
1001 1101 1011 0001
1110 0001 0011 1110

Machine code has several significant disadvantages associated it:

It is not intuitively obvious what a machine code instruction does simply from its encoding, consequently it is very difficult to read and write machine code.
Because of (1) the writing of machine code is extremely time consuming and error prone.
Many different machine codes exist (one for each make and type of computer).

These disadvantages all serve to severely limit the applications to which computers can be applied when using machine code.

2.2 Assembler languages (assemble code)

Assembler languages were initially developed to address the disadvantages associated with machine code programming. They used symbolic codes instead lists of binary instructions. Consequently programming became more "friendly". An example of assembly code is given below:

MOV AX 01
MOV BX 02
ADD AX BX

In assembler language each line of the program corresponds to one instruction in machine code. For a program written in assembler language to be executable it must be translated into machine code using a translating program called an assembler.

Although use of assembly languages offers some advantages there are still a number of significant disadvantages associated with their use:

Each model of computer has its own assembly language associated with it.
Assembler programming still requires great attention to detail and hence remains both time consuming and tedious.
Because of (2) the risk of program error is not significantly reduced.

Note that there are some computer applications, such as interfacing with peripherals, where assembler language is still a necessity

2.3 High level languages

From the early 1950s onwards high level languages were developed with the express aim of providing the means whereby computer programs could be written more efficiently and in a less error prone manner. The advantages offered are:

Programs written in a high-level language are more adapted to human modes of expression than to the computer's set of instructions. Programs are expressed in "half-English" and arithmetic calculations are written in a way familiar to mathematics.

As a result of (1) programmers can concentrate more closely on the problem to be solved rather than the mass of detail required for machine code or assembly language programming.
High level languages are not necessarily dedicated to a particular type or model of computer, a feature known as portability.

Of course programs written in high level languages still have to be translated into machine code instructions if they are to become executable. How this is achieved will be discussed later.

2.4 Categorisation of High Level Languages (Paradigms)

Because the majority of programming languages can be described as high level it is convenient to categorise these languages according to their various modes/styles of operation and usage --- these different styles are referred to as programming paradigms (ways of addressing problems by computer). The most prominent programming paradigms may be itemized as follows:

The imperative or sequential paradigm --- programs processed in a step by step manner (FORTRAN, COBOL, ALGOL, BASIC (and Visual Basic), Pascal, C and Ada).

The functional paradigm (LISP, Miranda, and Haskell).
The logic paradigm (PROLOG).
The object oriented paradigm (C++, Java, visual C++ and C# (pronounced "C sharp").

The paradigm we will be adopting is the object oriented paradigm.

We can also identify Models of a particular paradigm, for example the parallel model.

3. COMPILATION AND INTERPRETATION

We have seen that ultimately a computer can only operate on programs defined using machine code. Consequently a program written in a high-level language such as Java cannot be run directly. To execute a computer program written in a high level language it must be either compiled or interpreted.

3.1 Compilers

The central task of a compiler is to translate (convert) code written in a high level language into a machine executable form. Broadly a compiler program takes as input a file containing high level code (the source code) and outputs the content in a machine code format (sometimes referred to as the the load module). The advantage is that the machine executable form runs much faster than if it were interpreted (see below). The disadvantage is that different machines and operating systems have different machine codes associated with them --- consequently to compile a program in (say) windows 2000 would require a different compiler to that needed to do the same thing under UNIX/LINUX. Further, having compiled the program it can only run on/under the type of machine/operating system for which it was compiled.

In addition to "translation", when invoked, a compiler checks that the source code is syntactically correct, i.e. that it conforms to the syntax of the chosen high level language.

If the compiler finds syntactic errors (also sometimes called compile time errors) translation into machine code cannot be completed and the compiler will, instead, output appropriate error messages. When this happens the programmer must correct the program and then attempt to compile it again. This process normally proceeds for a number of iterations. If no errors are found the compiler goes on to translate the text into machine code. Note that a compiler cannot find logic errors (also referred to as execution or run time errors). These can only be found at "run time" and are concerned with the operation of the code rather than its expression. The process of removing errors (syntactic or logic) is called debugging (errors are sometimes referred to as bugs).

Thus compilers have two functions:

Checking for errors.
Translation to executable machine code.

3.2 Interpreters

In the case of interpretation each line of the program is decoded and "interpreted" by a special program known as an interpreter. Different interpreter are required for different languages (and different machines). Interpretation occurs every time a line in a program is executed. This means that a line which occurs many times must be interpreted on each occasion. This wastes computer time, and causes programs to run relatively slowly.

However, the repeated examination of the source program by an interpreter allows interpretation to be more flexible than when using a compiler. Interpreters also provide a faster and easier way of testing small programs or fragments of programs. Further, in the context of error detection, because the interpreter works directly with the source code, errors can be reported accurately with reference to line numbers (this is not the case when programs are compiled). Another, more questionable advantage, is that parts of a program which are not executed need not be interpreted.

3.3 Integrated Developmenrt Environments (IDEs)

When errors are found in a program the programmer must correct the source code. Source code is typically prepared using a program called a text editor that takes input from the keyboard/mouse, and outputs to the screen/printer/file store. You should not use a word processing program to create source code as this will include control characters and sequence peculiar to the word processor, which can not be understood by the compiler/interpreter.

Thus program writing involves a number of iterations of the text editing and compilation/interpretation. In some cases a text editor and compiler/interpreter may be combined into a single Integrated Development Environment (IDE) or simply an environment.

An IDE is a single piece of software which can be used to write, edit and interpret a high level language program. Given the edit-compile/interpret cycle encountered when witting computer programs IDE offer the advantage that the programmer can easily switch between editing and compiling/interpreting. Some environments also include facilities to manage program development (a useful feature when undertaking large software projects), and special editors to design and create user interfaces. The disadvantages of environments is that many are platform dependent (i.e. they will only work on a particular type of computers or under particular operating systems). A second disadvantage of environments is that the initial "learning curve" is often fairly steep.

Created and maintained by Frans Coenen. Last updated 10 February 2015