STREAM TOKENIZING (AND MORE FILE HANDLING)

CONTENTS

1.Introduction
2.Stream tokenizing and file handling
3.Further Stream tokenizing


1. INTRODUCTION

We have seen that when reading an input string supplied by a user we like to be able to analyse it token by token as demonstrated previously. To isolate such tokens we used the StringTokenizer class which is found in the package util. We can also use the string tokenizer to process input from a file line by line as also demonstrated demonstrated previously. However, there are a number of problems with this appraoch:

  1. We would like to be able to detect the end of the file (EOF) without having to know in advance either:
  2. It would also ne nice to be able to process a file token by token, rather than a line at a time, should this be desirable.

To address the above we can make use of the StreamTokenizer class.



2. STREAM TOKENIZING

A stream tokenizer takes an input stream and parses it into tokens, allowing the tokens to be read one at a time. A partial class diagram for the StreamTokenizer class is given in Figure 1. Some sample code that makes use of the stream tokenizer is given in Table 1.

CLASS DIAGRAM SHOWING SOME OF THE STREAM TOKENIZER FIELDS AND METHODS

Figure 1: Class diagram showing some of the StreamTokenizer fields and methods

// Stream TOKENIZER EXAMPLE
// Frans Coenen, Saturday 22 January 1999
// Department of Computer Science, The University of Liverpool, UK

import java.io.*; 
import java.util.*;

class TokenizerExample4
    {
    
    /* Main method  */

    public static void main(String[] args) throws IOException
        {
	FileReader file = new FileReader("HelloWorld2.java");
	StreamTokenizer inputStream = new StreamTokenizer(file);
	int tokenType = 0;
	int numberOfTokens = -1;
	
	// Process the file and output the number of tokens in the file
	
	do {
	    tokenType = inputStream.nextToken();
	    numberOfTokens++;
	    } while (tokenType != StreamTokenizer.TT_EOF);
	    
	// Output result and close file
	    
	System.out.println("Number of tokens = " + numberOfTokens);
	}
   }     

Table 1: Stream tokenizing example 1

The code contains a constructor which requires an argument of the type FileReader. Having created an instance of the class streamTokenizer we can use the nextToken method to read tokens from the input stream. Note also that we do not need to know how big the file is in advance in this case, we simply test the current token's type agaist the class integer constant TT_EOF (this has a value of -1). There a four possible predefined types of token: TT_EOF, TT_EOL, TT_Number and Word.

The input file used by the code presented in Table 1, HelloWorld2, is given in Table 2.

// HELLO WORLD PROGRAM 2
// Frans Coenen, Monday 15 January 1999
// Department of Computer Science, The University of Liverpool, UK

import java.io.*; 

class HelloWorld2
    {
    // Create BufferedReader class instance

    static InputStreamReader input         = new InputStreamReader(System.in);
    static BufferedReader    keyboardInput = new BufferedReader(input);

    /* Main method  */

    public static void main(String[] args) throws IOException
        {
        String name;

        System.out.print("What is your name? ");
        name = keyboardInput.readLine();

        System.out.print("\nHello " + name );
        System.out.println(" - Congratulations on writing your first" +
                " Java program which features some input!\n\n");
        }
    }      

Table 2: Test file

If we run the code presented in Table 1 the output will be:

Number of tokens = 70  

In Table 3 we have some code that identifies tokens and outputs the associated value. Note that if a token is of type TT_NUMBER the value is stored in the instance variable nval, and if it is of type TT_Word in the instance variable sval. The result from running the code is presented in Table 4.

// Stream TOKENIZER EXAMPLE
// Frans Coenen, Saturday 22 January 1999
// Department of Computer Science, The University of Liverpool, UK

import java.io.*; 
import java.util.*;

class TokenizerExample5
    {
    
    /* Main method  */

    public static void main(String[] args) throws IOException
        {
	FileReader file = new FileReader("HelloWorld2.java");
	StreamTokenizer inputStream = new StreamTokenizer(file);
	int tokenType = 0;
	int numberOfTokens = -1;
	
	// Process the file and output the number of tokens in the file
	
	do {
	    tokenType = inputStream.nextToken();
	    outputTtype(tokenType,inputStream);
	    numberOfTokens++;
	    } while (tokenType != StreamTokenizer.TT_EOF);
	    
	// Output result and close file
	    
	System.out.println("Number of tokens = " + numberOfTokens);
	}
   
   /* OUTPUT TTYPE:  Methof to output the ttype of a stream token and 
   its value. */
   
   private static void outputTtype(int ttype, StreamTokenizer inStream) {
       switch (ttype) {
           case StreamTokenizer.TT_EOF:
	   	System.out.println("TT_EOF");
		break;
	   case StreamTokenizer.TT_EOL:
	   	System.out.println("TT_EOL");
		break; 	
	   case StreamTokenizer.TT_NUMBER:
	   	System.out.println("TT_NUMBER: nval = " + inStream.nval);
		break;
	   case StreamTokenizer.TT_WORD:
	   	System.out.println("TT_WORD: sval = " + inStream.sval);
		break;	
           default:
	   	System.out.println("Unknown: nval = " + inStream.nval +
				 " sval = " + inStream.sval);
		break;
	   }
       }
   }       

Table 3: Stream tokenizing example 2

$ java TokenizerExample5
TT_WORD: sval = import
TT_WORD: sval = java.io.
Unknown: ttype = 42, nval = 0.0, sval = null
Unknown: ttype = 59, nval = 0.0, sval = null
TT_WORD: sval = class
TT_WORD: sval = HelloWorld2
Unknown: ttype = 123, nval = 0.0, sval = null
TT_WORD: sval = static
TT_WORD: sval = InputStreamReader
TT_WORD: sval = input
Unknown: ttype = 61, nval = 0.0, sval = null
TT_WORD: sval = new
TT_WORD: sval = InputStreamReader
Unknown: ttype = 40, nval = 0.0, sval = null
TT_WORD: sval = System.in
Unknown: ttype = 41, nval = 0.0, sval = null
Unknown: ttype = 59, nval = 0.0, sval = null
TT_WORD: sval = static
TT_WORD: sval = BufferedReader
TT_WORD: sval = keyboardInput
Unknown: ttype = 61, nval = 0.0, sval = null
TT_WORD: sval = new
TT_WORD: sval = BufferedReader
Unknown: ttype = 40, nval = 0.0, sval = null
TT_WORD: sval = input
Unknown: ttype = 41, nval = 0.0, sval = null
Unknown: ttype = 59, nval = 0.0, sval = null
TT_WORD: sval = public
TT_WORD: sval = static
TT_WORD: sval = void
TT_WORD: sval = main
Unknown: ttype = 40, nval = 0.0, sval = null
TT_WORD: sval = String
Unknown: ttype = 91, nval = 0.0, sval = null
Unknown: ttype = 93, nval = 0.0, sval = null
TT_WORD: sval = args
Unknown: ttype = 41, nval = 0.0, sval = null
TT_WORD: sval = throws
TT_WORD: sval = IOException
Unknown: ttype = 123, nval = 0.0, sval = null
TT_WORD: sval = String
TT_WORD: sval = name
Unknown: ttype = 59, nval = 0.0, sval = null
TT_WORD: sval = System.out.print
Unknown: ttype = 40, nval = 0.0, sval = null
Unknown: ttype = 34, nval = 0.0, sval = What is your name?
Unknown: ttype = 41, nval = 0.0, sval = null
Unknown: ttype = 59, nval = 0.0, sval = null
TT_WORD: sval = name
Unknown: ttype = 61, nval = 0.0, sval = null
TT_WORD: sval = keyboardInput.readLine
Unknown: ttype = 40, nval = 0.0, sval = null
Unknown: ttype = 41, nval = 0.0, sval = null
Unknown: ttype = 59, nval = 0.0, sval = null      
TT_WORD: sval = System.out.print
Unknown: ttype = 40, nval = 0.0, sval = null
Unknown: ttype = 34, nval = 0.0, sval =
Hello
Unknown: ttype = 43, nval = 0.0, sval = null
TT_WORD: sval = name
Unknown: ttype = 41, nval = 0.0, sval = null
Unknown: ttype = 59, nval = 0.0, sval = null
TT_WORD: sval = System.out.println
Unknown: ttype = 40, nval = 0.0, sval = null
Unknown: ttype = 34, nval = 0.0, sval =  - Congratulations on writing your first
Unknown: ttype = 43, nval = 0.0, sval = null
Unknown: ttype = 34, nval = 0.0, sval =  Java program which features some input!


Unknown: ttype = 41, nval = 0.0, sval = null
Unknown: ttype = 59, nval = 0.0, sval = null
Unknown: ttype = 125, nval = 0.0, sval = null
Unknown: ttype = 125, nval = 0.0, sval = null
TT_EOF
Number of tokens = 70      

Table 4: Sample output generate from tokenizing code presented in Table 3

Comparison of the above output with respect to the input given in Table 2 indicates that:

Interesting!



3. FURTHER STREAM TOKENIZING

Thus, using the stream tokenizer we could produce some code to "parse" a java source file and identify (say) the class and method names within it. Some appropriate code is given in Table 5 and some sample output (using the code given in Table 5 as input) in Table 6.

// Stream TOKENIZER EXAMPLE
// Frans Coenen, Saturday 22 January 1999
// Department of Computer Science, The University of Liverpool, UK

import java.io.*; 
import java.util.*;

class TokenizerExample6 {
    
    public static int tokenType = 0;
    
    /* MAIN: Main method  */

    public static void main(String[] args) throws IOException {
	if (checkInput(args)) parseFile(args[0]);
	}
   
   /* PARSE FILE: Top level method to start the parsing  */

    public static void parseFile(String fileNmae) throws IOException {
	FileReader file = new FileReader(fileNmae);
	StreamTokenizer inputStream = new StreamTokenizer(file);
	// int tokenType = 0;
	
	// Process the file and output the number of tokens in the file
	
	do {
	    tokenType = inputStream.nextToken();
	    } while (testForClassKeyWord(tokenType,inputStream));
	    
	// Output class name
	
	tokenType = inputStream.nextToken();    
	System.out.println("Class name " + inputStream.sval);
	
	findMethodNames(inputStream);
	}
	
    /* FIND METHOD NAMES: Look for method names in input stream: */
   
    private static void findMethodNames(StreamTokenizer inStream) throws IOException {   
        // keep processing until TT_EOF or initilaliser found
       
	do {
	    tokenType = inStream.nextToken();
	    if (testForInitialiser(tokenType,inStream)) findMethodName(inStream);    
	    } while (tokenType != StreamTokenizer.TT_EOF); 
	}
	
    /* FIND METHOD NAME: Look for method name in input stream: */
   
    private static void findMethodName(StreamTokenizer inStream) throws IOException {
        String name;
	
	// Continue while more initialisers are found
	
	do {
	    tokenType = inStream.nextToken();
	    } while (testForInitialiser(tokenType,inStream));    
       
	// Current token is a type or class name, next token is therefore a
	// data item or method name.
	
	inStream.nextToken();
	name = inStream.sval;
	
	// If next token is a '(' character (ASCII code 40) we have a method name.
	
	tokenType = inStream.nextToken();
	if (tokenType == 40)  System.out.println("Method name " + name);
	}
       
    /* TEST FOR CLASS KEY WORD: Return false if "class" keyword found, and
    true otherwise */
   
    private static boolean testForClassKeyWord(int ttype, StreamTokenizer inStream) {
        if ((ttype == StreamTokenizer.TT_WORD) && (inStream.sval.equals("class"))) 
       		return(false); 
        return(true);
        }
       
    /* TEST FOR Initialiser: Return true if "static", "final", "public"  or 
   "private" keyword found, and false otherwise */
   
    private static boolean testForInitialiser(int ttype,StreamTokenizer inStream) {
        if (ttype == StreamTokenizer.TT_WORD) {
            if ((inStream.sval.equals("static")) || (inStream.sval.equals("private")) 
	   		|| (inStream.sval.equals("public")))
       		return(true); 
	    }
        return(false);
        }
    
    /* CKECK INPUT: Check that a file name (command line argument) has been
    passed. If so return true, false ortherwise. */
    
    private static boolean checkInput(String[] args) {
        if (args.length == 0) {
	    System.out.println("ERROR: No filename supplied");
	    return(false);
	    }
	else return(true);
	}
    
    }    

Table 5: Java parser

$ java TokenizerExample6 TokenizerExample6.java
Class name TokenizerExample6
Method name main
Method name parseFile
Method name findMethodNames
Method name findMethodName
Method name testForClassKeyWord
Method name testForInitialiser
Method name checkInput     

Table 6: Sample output from code presented in Table 6.




Created and maintained by Frans Coenen. Last updated 12 May 2000