|
1. | Introduction | |
2. | Stream tokenizing and file handling | |
3. | Further Stream tokenizing |
We have seen that when reading an input string supplied by a user we like to be able to analyse it token by token as demonstrated previously. To isolate such tokens we used the StringTokenizer class which is found in the package util. We can also use the string tokenizer to process input from a file line by line as also demonstrated demonstrated previously. However, there are a number of problems with this appraoch:
To address the above we can make use of the StreamTokenizer class.
A stream tokenizer takes an input stream and parses it into tokens, allowing the tokens to be read one at a time. A partial class diagram for the StreamTokenizer class is given in Figure 1. Some sample code that makes use of the stream tokenizer is given in Table 1.
Figure 1: Class diagram showing some of the StreamTokenizer fields and methods
// Stream TOKENIZER EXAMPLE // Frans Coenen, Saturday 22 January 1999 // Department of Computer Science, The University of Liverpool, UK import java.io.*; import java.util.*; class TokenizerExample4 { /* Main method */ public static void main(String[] args) throws IOException { FileReader file = new FileReader("HelloWorld2.java"); StreamTokenizer inputStream = new StreamTokenizer(file); int tokenType = 0; int numberOfTokens = -1; // Process the file and output the number of tokens in the file do { tokenType = inputStream.nextToken(); numberOfTokens++; } while (tokenType != StreamTokenizer.TT_EOF); // Output result and close file System.out.println("Number of tokens = " + numberOfTokens); } } |
Table 1: Stream tokenizing example 1
The code contains a constructor which requires an argument of the type FileReader. Having created an instance of the class streamTokenizer we can use the nextToken method to read tokens from the input stream. Note also that we do not need to know how big the file is in advance in this case, we simply test the current token's type agaist the class integer constant TT_EOF (this has a value of -1). There a four possible predefined types of token: TT_EOF, TT_EOL, TT_Number and Word.
The input file used by the code presented in Table 1, HelloWorld2, is given in Table 2.
// HELLO WORLD PROGRAM 2 // Frans Coenen, Monday 15 January 1999 // Department of Computer Science, The University of Liverpool, UK import java.io.*; class HelloWorld2 { // Create BufferedReader class instance static InputStreamReader input = new InputStreamReader(System.in); static BufferedReader keyboardInput = new BufferedReader(input); /* Main method */ public static void main(String[] args) throws IOException { String name; System.out.print("What is your name? "); name = keyboardInput.readLine(); System.out.print("\nHello " + name ); System.out.println(" - Congratulations on writing your first" + " Java program which features some input!\n\n"); } } |
Table 2: Test file
If we run the code presented in Table 1 the output will be:
Number of tokens = 70
In Table 3 we have some code that identifies tokens and outputs the associated value. Note that if a token is of type TT_NUMBER the value is stored in the instance variable nval, and if it is of type TT_Word in the instance variable sval. The result from running the code is presented in Table 4.
// Stream TOKENIZER EXAMPLE // Frans Coenen, Saturday 22 January 1999 // Department of Computer Science, The University of Liverpool, UK import java.io.*; import java.util.*; class TokenizerExample5 { /* Main method */ public static void main(String[] args) throws IOException { FileReader file = new FileReader("HelloWorld2.java"); StreamTokenizer inputStream = new StreamTokenizer(file); int tokenType = 0; int numberOfTokens = -1; // Process the file and output the number of tokens in the file do { tokenType = inputStream.nextToken(); outputTtype(tokenType,inputStream); numberOfTokens++; } while (tokenType != StreamTokenizer.TT_EOF); // Output result and close file System.out.println("Number of tokens = " + numberOfTokens); } /* OUTPUT TTYPE: Methof to output the ttype of a stream token and its value. */ private static void outputTtype(int ttype, StreamTokenizer inStream) { switch (ttype) { case StreamTokenizer.TT_EOF: System.out.println("TT_EOF"); break; case StreamTokenizer.TT_EOL: System.out.println("TT_EOL"); break; case StreamTokenizer.TT_NUMBER: System.out.println("TT_NUMBER: nval = " + inStream.nval); break; case StreamTokenizer.TT_WORD: System.out.println("TT_WORD: sval = " + inStream.sval); break; default: System.out.println("Unknown: nval = " + inStream.nval + " sval = " + inStream.sval); break; } } } |
Table 3: Stream tokenizing example 2
$ java TokenizerExample5 TT_WORD: sval = import TT_WORD: sval = java.io. Unknown: ttype = 42, nval = 0.0, sval = null Unknown: ttype = 59, nval = 0.0, sval = null TT_WORD: sval = class TT_WORD: sval = HelloWorld2 Unknown: ttype = 123, nval = 0.0, sval = null TT_WORD: sval = static TT_WORD: sval = InputStreamReader TT_WORD: sval = input Unknown: ttype = 61, nval = 0.0, sval = null TT_WORD: sval = new TT_WORD: sval = InputStreamReader Unknown: ttype = 40, nval = 0.0, sval = null TT_WORD: sval = System.in Unknown: ttype = 41, nval = 0.0, sval = null Unknown: ttype = 59, nval = 0.0, sval = null TT_WORD: sval = static TT_WORD: sval = BufferedReader TT_WORD: sval = keyboardInput Unknown: ttype = 61, nval = 0.0, sval = null TT_WORD: sval = new TT_WORD: sval = BufferedReader Unknown: ttype = 40, nval = 0.0, sval = null TT_WORD: sval = input Unknown: ttype = 41, nval = 0.0, sval = null Unknown: ttype = 59, nval = 0.0, sval = null TT_WORD: sval = public TT_WORD: sval = static TT_WORD: sval = void TT_WORD: sval = main Unknown: ttype = 40, nval = 0.0, sval = null TT_WORD: sval = String Unknown: ttype = 91, nval = 0.0, sval = null Unknown: ttype = 93, nval = 0.0, sval = null TT_WORD: sval = args Unknown: ttype = 41, nval = 0.0, sval = null TT_WORD: sval = throws TT_WORD: sval = IOException Unknown: ttype = 123, nval = 0.0, sval = null TT_WORD: sval = String TT_WORD: sval = name Unknown: ttype = 59, nval = 0.0, sval = null TT_WORD: sval = System.out.print Unknown: ttype = 40, nval = 0.0, sval = null Unknown: ttype = 34, nval = 0.0, sval = What is your name? Unknown: ttype = 41, nval = 0.0, sval = null Unknown: ttype = 59, nval = 0.0, sval = null TT_WORD: sval = name Unknown: ttype = 61, nval = 0.0, sval = null TT_WORD: sval = keyboardInput.readLine Unknown: ttype = 40, nval = 0.0, sval = null Unknown: ttype = 41, nval = 0.0, sval = null Unknown: ttype = 59, nval = 0.0, sval = null TT_WORD: sval = System.out.print Unknown: ttype = 40, nval = 0.0, sval = null Unknown: ttype = 34, nval = 0.0, sval = Hello Unknown: ttype = 43, nval = 0.0, sval = null TT_WORD: sval = name Unknown: ttype = 41, nval = 0.0, sval = null Unknown: ttype = 59, nval = 0.0, sval = null TT_WORD: sval = System.out.println Unknown: ttype = 40, nval = 0.0, sval = null Unknown: ttype = 34, nval = 0.0, sval = - Congratulations on writing your first Unknown: ttype = 43, nval = 0.0, sval = null Unknown: ttype = 34, nval = 0.0, sval = Java program which features some input! Unknown: ttype = 41, nval = 0.0, sval = null Unknown: ttype = 59, nval = 0.0, sval = null Unknown: ttype = 125, nval = 0.0, sval = null Unknown: ttype = 125, nval = 0.0, sval = null TT_EOF Number of tokens = 70 |
Table 4: Sample output generate from tokenizing code presented in Table 3
Comparison of the above output with respect to the input given in Table 2 indicates that:
Interesting!
Thus, using the stream tokenizer we could produce some code to "parse" a java source file and identify (say) the class and method names within it. Some appropriate code is given in Table 5 and some sample output (using the code given in Table 5 as input) in Table 6.
// Stream TOKENIZER EXAMPLE // Frans Coenen, Saturday 22 January 1999 // Department of Computer Science, The University of Liverpool, UK import java.io.*; import java.util.*; class TokenizerExample6 { public static int tokenType = 0; /* MAIN: Main method */ public static void main(String[] args) throws IOException { if (checkInput(args)) parseFile(args[0]); } /* PARSE FILE: Top level method to start the parsing */ public static void parseFile(String fileNmae) throws IOException { FileReader file = new FileReader(fileNmae); StreamTokenizer inputStream = new StreamTokenizer(file); // int tokenType = 0; // Process the file and output the number of tokens in the file do { tokenType = inputStream.nextToken(); } while (testForClassKeyWord(tokenType,inputStream)); // Output class name tokenType = inputStream.nextToken(); System.out.println("Class name " + inputStream.sval); findMethodNames(inputStream); } /* FIND METHOD NAMES: Look for method names in input stream: */ private static void findMethodNames(StreamTokenizer inStream) throws IOException { // keep processing until TT_EOF or initilaliser found do { tokenType = inStream.nextToken(); if (testForInitialiser(tokenType,inStream)) findMethodName(inStream); } while (tokenType != StreamTokenizer.TT_EOF); } /* FIND METHOD NAME: Look for method name in input stream: */ private static void findMethodName(StreamTokenizer inStream) throws IOException { String name; // Continue while more initialisers are found do { tokenType = inStream.nextToken(); } while (testForInitialiser(tokenType,inStream)); // Current token is a type or class name, next token is therefore a // data item or method name. inStream.nextToken(); name = inStream.sval; // If next token is a '(' character (ASCII code 40) we have a method name. tokenType = inStream.nextToken(); if (tokenType == 40) System.out.println("Method name " + name); } /* TEST FOR CLASS KEY WORD: Return false if "class" keyword found, and true otherwise */ private static boolean testForClassKeyWord(int ttype, StreamTokenizer inStream) { if ((ttype == StreamTokenizer.TT_WORD) && (inStream.sval.equals("class"))) return(false); return(true); } /* TEST FOR Initialiser: Return true if "static", "final", "public" or "private" keyword found, and false otherwise */ private static boolean testForInitialiser(int ttype,StreamTokenizer inStream) { if (ttype == StreamTokenizer.TT_WORD) { if ((inStream.sval.equals("static")) || (inStream.sval.equals("private")) || (inStream.sval.equals("public"))) return(true); } return(false); } /* CKECK INPUT: Check that a file name (command line argument) has been passed. If so return true, false ortherwise. */ private static boolean checkInput(String[] args) { if (args.length == 0) { System.out.println("ERROR: No filename supplied"); return(false); } else return(true); } } |
Table 5: Java parser
$ java TokenizerExample6 TokenizerExample6.java Class name TokenizerExample6 Method name main Method name parseFile Method name findMethodNames Method name findMethodName Method name testForClassKeyWord Method name testForInitialiser Method name checkInput |
Table 6: Sample output from code presented in Table 6.
Created and maintained by Frans Coenen. Last updated 12 May 2000