THE LUCS-KDD APRIORI-T ASSOCIATION RULE MINING ALGORITHM



Liverpool University

Frans Coenen

Department of Computer Science

The University of Liverpool

13 February 2004

Updated: 26 February 2008


CONTENTS

1. Introduction.
2. Downloading the software.
2.1. Compiling.
2.2. Documentation.
 
3. Running the software.



1. INTRODUCTION

Apriori-T (Apriori Total) is an Association Rule Mining (ARM) algorithm, developed by the LUCS-KDD research team. The code obtainable from this page is a GUI version that inludes (for comparison purpopses) implementations of Brin's DIC algorithm (Brin et al. 1997) and Toivonon's negative boarder ARM approach (Toivonen 1996).




2. DOWNLOADING THE SOFTWARE


The Apriori-T ARM software comprises four source files. These are provided from this WWW page together with three application classes. The source files are as follows:

  1. AprioriT_GUI_App: Application class.
  2. AprioriTcontrol: Class containing the GUI control methods.
  3. AssocRuleMining: Set of general ARM utility methods to allow: (i) data input and input error checking, (ii) data preprocessing, (iii) manipulation of records (e.g. operations such as subset, member, union etc.) and (iv) data and parameter output.
  4. DIC_Ttree: Methods to implement the LUCS-KKD version of the DIC algorithm.
  5. DIC_TtreeNode: Set of methofd concerned with the structure of T-tree nodes adapted for the DIC algorithm.
  6. RuleNode: Set of methods that allow the creation and manipulation (e.g. ordering, etc.) of a list of ARs.
  7. TotalSupportTree.java: Methods to implement the "Apriori-T" algorithm using the "Total support" tree data structure (T-tree).
  8. TotalSupportTreeNegBorder: Methods to implement a vataition of "Apriori-T" algorithm incoporating the negative boarder concept
  9. TotalSupportTreeNoXcheck: Version of Apriori-T algorith that does not include cross checking. Cross checking is the process where by the neighbouring branches of the T-tree are checked to ensure that all the K-1 subsets of a cndidate set are supported (the downwared closure property of itemsets is used in the candidate generation process). Cross checking carried a time penalty with it and so Apriori-T without cross checking is a faster algorithm, however it usually rerquires more storage space as unecessary candidate sets are often generated.
  10. TtreeCanvas: Methods to create a canvas onto which the T-tree may be painted if rquired.
  11. TtreeNode: Methods concerned with the structure of Ttree nodes. Arrays of these structures are used to store nodes at the same level in any sub-branch of the T-tree. Note this is a separate class to the other T-tree classes which are arranged in a class hierarchy as illustrated below.
  12. TtreeWindow: Methods to create a GUI window to contain a T-tree canvas (if requested).

The relationship between some of the principal classes is as shown in the hierarchies in Tables 1 and 2 below.

                      AssocRuleMining
                            |
                     TotalSupportTree
                            |
     +----------------------+----------------------+
     |                      |                      |
DIC_Ttree     TotalSupportTreeNegBorder TotalSupportTreeNoXcheck

Table 1: Class hierarchy


  TtreeNode
      |
DIC_TtreeNode

Table 2: Class hierarchy for DIC algorithm

There is also a "tar ball" aprioriT.tgz that can be downloaded that includes all of the above source and application class files. It can be unpacked using tar -zxf aprioriT.tgz.


2.1. Compiling

The software has been implemented in Java which should therefore make it highly portable. The code does not require any special packages and thus can be compiled using the standard Java compiler:

javac *.java

2.2. Documentation

The code can be documented using Java Doc. First create a directory Documentation in which to place the resulting HTML pages and then type:

javadoc -d Documentation/ *.java

This will produce a hierarchy of WWW pages contained in the Document directory.




3. RUNNING THE SOFTWARE


When compiled the software can be invoked in the normal manner using the Java interpreter:

java AprioriT_GUI_App

The resulting GUI window is shown in Figure 1. If you are planning to process a very large data set it is a good idea to grab some extra memory. For example:

java -Xms600m -Xmx600m AprioriT_GUI_App
Apriori-T GUI Window

Figure 1: Apriori-T GUI Window

The input to the software, in all cases is a (space separated) binary valued data set D. The set R comprises a set of N records such that each record (r), in turn, comprises a set of attributes (A). We say that a particular data set has M columns and N rows. A small example data sets might be as follows:

1 2 3 6
1 4 5 7
1 3 4 6
1 2 6
1 2 3 4 5 7

where, in this case, A = {1, 2, 3, 4, 5, 6, 7}. Note that attribute numbers are ordered sequentially commencing with the number 1 (the value 0 has a special meaning).




REFERENCES


Brin, S., Motwani, R., Ullman, J.D. and Tsur S. (1997).
Dynamic itemset counting and implication rules for market basket data. SIGMOD Record, Vol 6, No 2, ACM Press, pp 255-264.
Toivonen, H. (1996)
Sampling Large Databases for Association Rules. Proc. 1996 Int. Conf. Very Large Data Bases, pp. 134-145.



Created and maintained by Frans Coenen. Last updated 13 March 2007