THE LUCS-KDD GROUP

(LIVERPOOL UNIVERSITY COMPUTER SCIENCE - KNOWLEDGE DISCOVERY IN DATA)

APRIORI-T DEMONSTRATOR



Department of Computer Science

The University of Liverpool

27 June 2003



CONTENTS

1. The Apriori-T algorithm.
2. Downloadable Apriori-T demonstrator.
 
3. Source code for demonstrator.



1. THE APRIORI-T ALGORITHM

The Apriori-T ARM algorithm uses an Apriori style mechanism to process an input set but using the T-tree data structure developed by Coenen, Leng and Goulbourne (see [1] and [2]).

The T-tree (Total Support Tree) is a compressed set enumeration tree structure. The T-tree differs from more standard set enumeration trees in that the nodes at the same level in any sub-branch are organised into 1-D arrays such that the array indexes represent column numbers. For this purpose it is more convenient to build a "reverse" version of the tree (Figure 1(a)), which permits direct indexing with attribute/column numbers. The T-tree offers two initial advantages over standard set enumeration trees:

  1. Fast traversal of the tree using indexing mechanisms, and
  2. Reduced storage, in that itemset labels are not required to be explicitly stored, and thus no sibling reference variables (pointers) are required.

The Apriori-T algorithm combines the classic Apriori ARM algorithm of Agrawal and Srikant (1994) with the T-tree data structure. As each level is processed, candidates are added as a new level of the T-tree, their support is counted, and those that do not reach the required threshold of support are subsequently pruned. When the algorithm terminates, the T-tree contains only the large itemsets. At each level, new candidate K itemsets are generated from identified large K-1 itemsets, using the downward closure property of itemsets, which in turn may necessitate the inspection of neighbouring branches in the T-tree to determine if a particular K-1 subset is supported. We refer to this process as X-checking. Note that X-checking adds a computational overhead; offset against the additional effort required to establish whether a candidate K itemset, all of whose K-1 itemsets may not necessarily be supported, is or is not a large itemset. In some cases it is more expedient to assume that those subsets of a candidate K itemset, that are contained in neighbouring branches of the T-tree, are supported than to carry out X-checking.

REVERSE SET ENUMERASTION TREE --- T-TREE STRUCTURE

Figure 1: The T-tree (Total support tree) for the data set {{1,3,4},{2,4,5},{2,4,6}}. Note that, for ease of processing, items/attributes are enumerated commencing with 1




2. THE DOWNLOADABLE APRIORI-T DEMONSTRATOR

We have developed a small Apriori-T demonstrator GUI that can be downloaded from this site. This comprises two class files:

  1. AprioriTgui.class
  2. AprioriTgui$TtreeNode.class

The input data files used must comprise a set of records, one per line, such that each record consists of an ordered sequence of "space separated" integers (not "comma separated"). Note that the presence of an integer in a record indicates the presence of a boolean attribute in that column number for the record in question. It is assumed that columns are numbered from 1 to N inclusive. Example input file:

1 2 3
4 5 6
1 3 6
2 4 6
1 2 3

Table 1: Test data file (5 records, 6 columns)

Figure 2 shows an example session with the Apriori-T demonstrator using the data file given in Table 1 and a support threshold of 5 (5%). Note that three levels were generated in the T-tree (maximum value for K is 3)

APRIORI T GUI EXAMPLE SESSION

Figure 2: Apriori T GUI example session using the data set presented in Table 1 and a support threshold of 5%




3. SOURCE CODE FOR DEMONSTRATOR

For those that may be interested the Java source code for the demomstrator is avialable here:

This should be compiled in the usual manner:

javac AprioriTgui.java



REFERENCES

  1. Goulbourne, G., Coenen, F. and Leng, P. (2000). Algorithms for Computing Association Rules Using a Partial-Support Tree. Journal of Knowledge-Based Systems, Vol (13), pp141-149.
  2. Coenen, F., Goulbourne, G. and Leng, P., (2003). Tree Structures for Mining association Rules. Journal of Data Mining and Knowledge Discovery, Vol 8, No 1, pp25-51.



Created and maintained by Frans Coenen. Last updated 21 May 2007