THE LUCS-KDD IMPLEMENTATIONS OF CPAR (CLASSIFICATION BASED ON PREDICTIVE ASSOCIATION RULES)



Liverpool University

Frans Coenen

Department of Computer Science

The University of Liverpool

16 February 2004

DISCLAIMER! The following description of the CPAR algorithm (Yin and Han 2003) is that implemented by the author of this WWW page, i.e. it may not be identical to that first produced by Xiaoxin Yin and Jiawei Han but certainly displays all the "salient" features of the CPAR algorithm.

CONTENTS

1. Overview of CPAR algorithm.
2. Rule pruning.
 
3. Testing.
4. Example.

NOTE: The description given here should be read in the context of; (a) the FOIL (First Order Inductive Learner) algorithm developed by Quinlan and Cameron-Jones, and (b) the PRM (Predictive Rule Mining) algorithm also developed by Xiaoxin Yin and Jiawei Han.




1. OVERVIEW OF CPAR ALGORITHM

The CPAR (Classification based on Predictive Association Rules) algorithm (Xiaoxin Yin and Jiawei Han) is an extension of PRM which in turn is an extension of FOIL (Quinlan and Cameron-Jones 1993). The distinction between CPAR and PRM is that instead of choosing only the attribute that displays the best gain on each iteration (as in FOIL and PRM), CPAR may choose a number of attributes if those attributes have similar best gain. This is done by first calculating the gain and applying a GAIN_SIMILARITY_RATIO to this (Xiaoxin Yin and Jiawei Han suggest 0.99). All attributes with gain better than:

bestGain*GAIN_SIMILARITY_RATIO

are then selected for further processing.

CPAR (like FOIL and PRM) takes as input a (space separated) binary valued data set R and produces a set of CARs. The resulting classifier by the LUCS-KDD PRM algorithm comprises a linked-list of rules ordered according to Laplace accuracy (Clark and Boswell 1991) of individual rules.

The CPAR algorithm, as implemented by the LUCS-KDD research team, is presented in Tables 1 and 2.

method: startCPAR
Parameters: none
Global access to: R, C
----------------------------------------------

generate an empty global attributes array A

For each c in C
    generate global P and N example arrays
    generate global PN array
    determine minimum total Weight threshold

    while (total weight P > minimum total Weight threshold)
	A' <-- A, N' <-- N, P' <-- P, PN' <-- PN
	if no attributes exist with weightings that can
			produce a gain above minimum break
	cparGeneration({},c)
    end loop
end loop

Table 1: The startCPAR method.

method: cparGenerastion/2
Parameters: parameters ante (antecedent so far) and
	cons (consequent) for current rule.
Global access to: A, A', N, N', P, P', PN, PN'
----------------------------------------------

for each a in A
    if (a not in antecedent1) calculated gain using
    	       information in PN array and add to attribute array
end loop
i = "available" column in A with best gain
if (A[i][0] <= MIN_BEST_GAIN) return
    
loop through attribute array and find attribute a' with best gain
if (best gain <= MIN_BEST_GAIN) {
    add antecedent->c to rule list
    for all records in P reduce weighting by decay factor
     		and adjust PN array accordingly
    return

gainThreshold = bestGain*GAIN_SIMILARITY_RATIO
for each a in A
    if a available1 and a.gain>gainThreshold 
	tempP' <-- P', tempN' <-- N', tempA' <-- A', tempNP'<--NP'
    	add a' to antecedent 
    	remove examples from N' and P' that do not contain 
		antecedent and adjust NP array accordingly
    	if (N' == {}) 
            add antecedent->c to rule list
            for all roecords in P reduce wieighting by decay factor
     				and adjust NP array accordingly
    	else prmGeneration(antecedent,c)
    	P' <-- tempP, N' <-- tempN', A' <-- tempA', NP' <-- tempNP'
end loop

Table 2: The prmGenerastion method.

1. An attribute is not in the antecedent if the appropriate cell in the attributes array has the value 0.

In their experiments Xiaoxin Yin and Jiawei Han used a minimum gain constant of 0.7, and a decay factor of 1/3. New weightings are thus calculated as follows:

newWeighting = oldWeighting*decayFractor

and gain is calculated as follows:

WP  = totalWeight(P)
WN  = totalWeight(N)
WP' = totalWeight(P')
WN' = totalWeight(N')
gain(a) = WP' (log(WP'/WP'+NP') - log(WP/WP+NP))

The minimum weight threshold for P is calculated by multiplying the start weight of P by the TOTAL_WEIGHT_THRESHOLD which was set to 0.05 during experimentation. The GAIN_SIMILARITY_RATIO was set to 0.99.




2. RULE PRUNING


It is possible, using CPAR, to generate two or more rules with identical antecedents and consequents. This is because records are not removed from the N and P lists once a rule that satisfies them has been found, but instead these records simply have their weughting reduced. This reduced weighting forms part of the algorithm to calculate gains, however it is still possible for the same attributes to be selected to form the antecedent of a rule because these attributes (despite the reduce weighting) still produce the best gain. Eventually the weighting for the effected records is reduced so far that the attributes do not produce a best gain and are therefore not selected. Where this occurs the rules with the lower accuracy are removed from the rule list.




3. TESTING


During testing, for each record in the test sets (as recommended in Yin and Han 2003) the record is classified using the following procedure:

  1. Obtain all rules whose antecedent is a subset of the given record.
  2. Select the best K rules for each class according to their Laplace accuracy. In experiments K was set to 5).
  3. Determine the average expected accuracy over the selected rules for each class
  4. Select the class with the best average expected accuracy.



4. WORKED EXAMPLE

Note that for illustrative purposes, in the following example, the GAIN_SIMILARITY_RATIO has been set to 0.6 instead of a more usual value of about 0.99.

given the following data set (first 15 lines from the Pima Indian set):

2 9 13 17 21 28 32 38 42 
1 8 13 17 21 27 31 36 41 
3 10 13 16 21 27 32 36 42 
1 8 13 17 21 28 31 36 41 
1 9 12 17 21 29 35 37 42 
2 8 14 16 21 27 31 36 41 
1 7 13 17 21 28 31 36 42 
3 8 11 16 21 28 31 36 41 
1 10 13 18 24 28 31 38 42 
3 9 14 16 21 26 31 38 42 
2 8 14 16 21 28 31 36 41 
3 10 14 16 21 28 31 37 42 
3 9 14 16 21 28 33 39 41 
1 10 13 17 25 28 31 39 42 
2 10 13 16 22 27 32 38 42 

We commence with class 41 and create the following P and N sets and an NP array. The first column numbers in the NP array gives the attribute label (1 to 40), the following two columns give the positive and negative total weightings for the P and N example sets respectively. On "start up" each attribute has a weighting of 1 associated with it, so attribute 1 occurs twice in P and four times in N.

positiveExamples:
 {1 8 13 17 21 27 31 36 41}  (1.0)
 {1 8 13 17 21 28 31 36 41}  (1.0)
 {2 8 14 16 21 27 31 36 41}  (1.0)
 {3 8 11 16 21 28 31 36 41}  (1.0)
 {2 8 14 16 21 28 31 36 41}  (1.0)
 {3 9 14 16 21 28 33 39 41}  (1.0)

negativeExamples:
 {2 9 13 17 21 28 32 38 42}  (1.0)
 {3 10 13 16 21 27 32 36 42}  (1.0)
 {1 9 12 17 21 29 35 37 42}  (1.0)
 {1 7 13 17 21 28 31 36 42}  (1.0)
 {1 10 13 18 24 28 31 38 42}  (1.0)
 {3 9 14 16 21 26 31 38 42}  (1.0)
 {3 10 14 16 21 28 31 37 42}  (1.0)
 {1 10 13 17 25 28 31 39 42}  (1.0)
 {2 10 13 16 22 27 32 38 42}  (1.0)

PN array 2
Att 1: 	2.0 	4.0
Att 2: 	2.0 	2.0
Att 3: 	2.0 	3.0
Att 4: 	0.0 	0.0
Att 5: 	0.0 	0.0
Att 6: 	0.0 	0.0
Att 7: 	0.0 	1.0
Att 8: 	5.0 	0.0
Att 9: 	1.0 	3.0
Att 10: 0.0 	5.0
Att 11: 1.0 	0.0
Att 12: 0.0 	1.0
Att 13: 2.0 	6.0
Att 14: 3.0 	2.0
Att 15: 0.0 	0.0
Att 16: 4.0 	4.0
Att 17: 2.0 	4.0
Att 18: 0.0 	1.0
Att 19: 0.0 	0.0
Att 20: 0.0 	0.0
Att 21: 6.0 	6.0
Att 22: 0.0 	1.0
Att 23: 0.0 	0.0
Att 24: 0.0 	1.0
Att 25: 0.0 	1.0
Att 26: 0.0 	1.0
Att 27: 2.0 	2.0
Att 28: 4.0 	5.0
Att 29: 0.0 	1.0
Att 30: 0.0 	0.0
Att 31: 5.0 	5.0
Att 32: 0.0 	3.0
Att 33: 1.0 	0.0
Att 34: 0.0 	0.0
Att 35: 0.0 	1.0
Att 36: 5.0 	2.0
Att 37: 0.0 	2.0
Att 38: 0.0 	4.0
Att 39: 1.0 	1.0
Att 40: 0.0 	0.0

This means that the total weight of P is 6.0 and the Total Weight Threshold (TWT) is:

TWT = totalWeight*TOTAL_WEIGHT_THRESHOLD
    = 6.0*0.05 = 0.3

The set P is then processed, for the given class until its total weight falls below the threshold (0.3). First we make copies of P, N, A and PN: P', N', A' and PN'. We then claculate the gain (using threshold values) for each attribute were it to be added to the rule antecedent (which currently is empty) and place the resulting gains in the attributes array. The result is as follows:

Atributes:
Col 1: 0.0 -0.36
Col 2: 0.0 0.45 
Col 3: 0.0 0.0 
Col 4: 0.0 0.0 
Col 5: 0.0 0.0 
Col 6: 0.0 0.0 
Col 7: 0.0 0.0 
Col 8: 0.0 4.58 
Col 9: 0.0 -0.47 
Col 10: 0.0 0.0 
Col 11: 0.0 0.91 
Col 12: 0.0 0.0 
Col 13: 0.0 -0.94 
Col 14: 0.0 1.21 
Col 15: 0.0 0.0 
Col 16: 0.0 0.89 
Col 17: 0.0 -0.36 
Col 18: 0.0 0.0 
Col 19: 0.0 0.0 
Col 20: 0.0 0.0 
Col 21: 0.0 1.33 
Col 22: 0.0 0.0 
Col 23: 0.0 0.0 
Col 24: 0.0 0.0 
Col 25: 0.0 0.0 
Col 26: 0.0 0.0 
Col 27: 0.0 0.45 
Col 28: 0.0 0.42 
Col 29: 0.0 0.0 
Col 30: 0.0 0.0 
Col 31: 0.0 1.12 
Col 32: 0.0 0.0 
Col 33: 0.0 0.92 
Col 34: 0.0 0.0 
Col 35: 0.0 0.0 
Col 36: 0.0 2.90 
Col 37: 0.0 0.0 
Col 38: 0.0 0.0 
Col 39: 0.0 0.22
Col 40: 0.0 0.0 

As in the LUCS KDD implementation of the CPAR algorithm (and FOIL and PRM) the first digit in the attributes 2-D array is set to 0.0 to indicate that the attribute is available to be included in a rule (i.e. not already in the current rule antecedent) and the second to hold gain values when calculated. When an attribute becomes no longer available the first digit will be set to 1.0. The attribute array is used purely to enhance the computational efficiency of the algorithm.

If there were no gain above the user specified minimum we would add the rule sofar to the rule list. However there are attributes (see above) with a gain above the minimum. The attribute with the best gain is attribute 8 (gain = 4.58). So the Local Gain Threshold (LGT) will be:

LGT = 4.58*GAIN_SIMILARITY_RATIO
    = 4.58*0.6
    = 2.75

There are actually two attributes above this local minimum: 8 and 36.

We process the attributes array attribute by attribute looking for gains above the local minimum and will arrive at the attribute 8 first. We make copies of P', N', A' and PN'; and proceed. with these copies, as follows:

  1. Add attribute to the rule antecedent 8 -> 41.
  2. Adjusted the P' and N' copies so that all examples which do not contain attribute 8 are removed and the copy of the PN array adjusted accordingly.

Thus:

positiveExamples2 (copy):
 {1 8 13 17 21 27 31 36 41} 
 {1 8 13 17 21 28 31 36 41} 
 {2 8 14 16 21 27 31 36 41} 
 {3 8 11 16 21 28 31 36 41} 
 {2 8 14 16 21 28 31 36 41} 

negativeExamples2 (copy): Null

PN array 2 (copy):
Att 1: 	2.0 	0.0
Att 2: 	2.0 	0.0
Att 3: 	1.0 	0.0
Att 4: 	0.0 	0.0
Att 5: 	0.0 	0.0
Att 6: 	0.0 	0.0
Att 7: 	0.0 	0.0
Att 8: 	5.0 	0.0
Att 9: 	0.0 	0.0
Att 10: 0.0 	0.0
Att 11: 1.0 	0.0
Att 12: 0.0 	0.0
Att 13: 2.0 	0.0
Att 14: 2.0 	0.0
Att 15: 0.0 	0.0
Att 16: 3.0 	0.0
Att 17: 2.0 	0.0
Att 18: 0.0 	0.0
Att 19: 0.0 	0.0
Att 20: 0.0 	0.0
Att 21: 5.0 	0.0
Att 22: 0.0 	0.0
Att 23: 0.0 	0.0
Att 24: 0.0 	0.0
Att 25: 0.0 	0.0
Att 26: 0.0 	0.0
Att 27: 2.0 	0.0
Att 28: 3.0 	0.0
Att 29: 0.0 	0.0
Att 30: 0.0 	0.0
Att 31: 5.0 	0.0
Att 32: 0.0 	0.0
Att 33: 0.0 	0.0
Att 34: 0.0 	0.0
Att 35: 0.0 	0.0
Att 36: 5.0 	0.0
Att 37: 0.0 	0.0
Att 38: 0.0 	0.0
Att 39: 0.0 	0.0
Att 40: 0.0 	0.0

N' is now empty thus the rule 8 -> 41 is inserted into the rule list. We revise the weightings in P by the decay factor and adjust the PN array accordingly. This will give a P example set of:

 
positiveExamples:
 {1 8 13 17 21 27 31 36 41}  (0.33)
 {1 8 13 17 21 28 31 36 41}  (0.33)
 {2 8 14 16 21 27 31 36 41}  (0.33)
 {3 8 11 16 21 28 31 36 41}  (0.33)
 {2 8 14 16 21 28 31 36 41}  (0.33)
 {3 9 14 16 21 28 33 39 41}  (1.0) 

We return to the attribute list, and continue processing looking for attributes with a gain above the minimum of 2.75, and find attribute 36 (gain = 2.899092476264711). We add this rule to the antecedent so far to give 36 -> 41 and continue processing with copies of N', P', A' and NP'.

The copies of N' and P' are reduced and the copu of NP' adjusted to give:

positiveExamples2 (copy):
 {1 8 13 17 21 27 31 36 41}  (1.0)
 {1 8 13 17 21 28 31 36 41}  (1.0)
 {2 8 14 16 21 27 31 36 41}  (1.0)
 {3 8 11 16 21 28 31 36 41}  (1.0)
 {2 8 14 16 21 28 31 36 41}  (1.0)

negativeExamples2 (copy):
Null

NP array (copy):
Att 1: 	2.0 	0.0
Att 2: 	2.0 	0.0
Att 3: 	1.0 	0.0
Att 4: 	0.0 	0.0
Att 5: 	0.0 	0.0
Att 6: 	0.0 	0.0
Att 7: 	0.0 	0.0
Att 8: 	5.0 	0.0
Att 9: 	0.0 	0.0
Att 10: 0.0 	0.0
Att 11: 1.0 	0.0
Att 12: 0.0 	0.0
Att 13: 2.0 	0.0
Att 14: 2.0 	0.0
Att 15: 0.0 	0.0
Att 16: 3.0 	0.0
Att 17: 2.0 	0.0
Att 18: 0.0 	0.0
Att 19: 0.0 	0.0
Att 20: 0.0 	0.0
Att 21: 5.0 	0.0
Att 22: 0.0 	0.0
Att 23: 0.0 	0.0
Att 24: 0.0 	0.0
Att 25: 0.0 	0.0
Att 26: 0.0 	0.0
Att 27: 2.0 	0.0
Att 28: 3.0 	0.0
Att 29: 0.0 	0.0
Att 30: 0.0 	0.0
Att 31: 5.0 	0.0
Att 32: 0.0 	0.0
Att 33: 0.0 	0.0
Att 34: 0.0 	0.0
Att 35: 0.0 	0.0
Att 36: 5.0 	0.0
Att 37: 0.0 	0.0
Att 38: 0.0 	0.0
Att 39: 0.0 	0.0
Att 40: 0.0 	0.0

The copy of N' is now empty so the rule 36 -> 41 as inserted into the rule list and P adjusted accordingly:

positiveExamples:
{1 8 13 17 21 27 31 36 41}  (0.11)
{1 8 13 17 21 28 31 36 41}  (0.11)
{2 8 14 16 21 27 31 36 41}  (0.11)
{3 8 11 16 21 28 31 36 41}  (0.11)
{2 8 14 16 21 28 31 36 41}  (0.11)
{3 9 14 16 21 28 33 39 41}  (1.0)

P has now had all the weightings that satisfy the rules 8 -> 41 and 36->41 reduced by the given decay factor of 1/3. The NP array is now:

Att 1: 	0.22	4.0
Att 2: 	0.22 	2.0
Att 3: 	1.11	3.0
Att 4: 	0.0 	0.0
Att 5: 	0.0 	0.0
Att 6: 	0.0 	0.0
Att 7: 	0.0 	1.0
Att 8: 	0.55 	0.0
Att 9: 	1.0 	3.0
Att 10: 0.0 	5.0
Att 11: 0.11 	0.0
Att 12: 0.0 	1.0
Att 13: 0.22	6.0
Att 14: 1.22	2.0
Att 15: 0.0 	0.0
Att 16: 1.33 	4.0
Att 17: 0.22 	4.0
Att 18: 0.0 	1.0
Att 19: 0.0 	0.0
Att 20: 0.0 	0.0
Att 21: 1.55 	6.0
Att 22: 0.0 	1.0
Att 23: 0.0 	0.0
Att 24: 0.0 	1.0
Att 25: 0.0 	1.0
Att 26: 0.0 	1.0
Att 27: 0.22 	2.0
Att 28: 1.33 	5.0
Att 29: 0.0 	1.0
Att 30: 0.0 	0.0
Att 31: 0.55 	5.0
Att 32: 0.0 	3.0
Att 33: 1.0 	0.0
Att 34: 0.0 	0.0
Att 35: 0.0 	1.0
Att 36: 0.55 	2.0
Att 37: 0.0 	2.0
Att 38: 0.0 	4.0
Att 39: 1.0 	1.0
Att 40: 0.0 	0.0

The Total Weight Threshold (TWT) for P is now 1.55, above the minimum threshold of 0.3, so we repeat the process with the revised example sets P' (copied from the revised version of P) and N' (copied from the original N):

positiveExamples2:
 {1 8 13 17 21 27 31 36 41}  (0.11)
 {1 8 13 17 21 28 31 36 41}  (0.11)
 {2 8 14 16 21 27 31 36 41}  (0.11)
 {3 8 11 16 21 28 31 36 41}  (0.11)
 {2 8 14 16 21 28 31 36 41}  (0.11)
 {3 9 14 16 21 28 33 39 41}  (1.0)

negativeExamples2:
 {2 9 13 17 21 28 32 38 42}  (1.0)
 {3 10 13 16 21 27 32 36 42}  (1.0)
 {1 9 12 17 21 29 35 37 42}  (1.0)
 {1 7 13 17 21 28 31 36 42}  (1.0)
 {1 10 13 18 24 28 31 38 42}  (1.0)
 {3 9 14 16 21 26 31 38 42}  (1.0)
 {3 10 14 16 21 28 31 37 42}  (1.0)
 {1 10 13 17 25 28 31 39 42}  (1.0)
 {2 10 13 16 22 27 32 38 42}  (1.0)

We also produce a new NP' made of the positive weightings from the previous NP' and the negative weightings from NP. The result is as follows:

Att 1: 	0.22 	4.0
Att 2: 	0.22 	2.0
Att 3: 	1.11 	3.0
Att 4: 	0.0 	0.0
Att 5: 	0.0 	0.0
Att 6: 	0.0 	0.0
Att 7: 	0.0 	1.0
Att 8: 	0.56 	0.0
Att 9: 	1.0 	3.0
Att 10: 0.0 	5.0
Att 11: 0.11 	0.0
Att 12: 0.0 	1.0
Att 13: 0.22 	6.0
Att 14: 1.22 	2.0
Att 15: 0.0 	0.0
Att 16: 1.33 	4.0
Att 17: 0.22 	4.0
Att 18: 0.0 	1.0
Att 19: 0.0 	0.0
Att 20: 0.0 	0.0
Att 21: 1.56 	6.0
Att 22: 0.0 	1.0
Att 23: 0.0 	0.0
Att 24: 0.0 	1.0
Att 25: 0.0 	1.0
Att 26: 0.0 	1.0
Att 27: 0.22 	2.0
Att 28: 1.33 	5.0
Att 29: 0.0 	1.0
Att 30: 0.0 	0.0
Att 31: 0.56 	5.0
Att 32: 0.0 	3.0
Att 33: 1.0 	0.0
Att 34: 0.0 	0.0
Att 35: 0.0 	1.0
Att 36: 0.56 	2.0
Att 37: 0.0 	2.0
Att 38: 0.0 	4.0
Att 39: 1.0 	1.0
Att 40: 0.0 	0.0

we also create an empty attributes array A' (by copying A). As before we then calculate all gains for each attribute (and store in the attributes array). The attribute with the best gain is now attribute 33 (gain = 0.92) which means the Local Gain Threshold (LGT) will be calculated as:

LGT = 0.92*GAIN_SIMILARITY_RATIO
    = 0.92*0.6
    = 0.55

Which is above specified global minimum of 0.7, so 0.7 will be used as the local minimum. There is only one attribute above this threshold (33). This is inserted into the rule antecedent to give 33 -> 41 and the copies of the P' and N' example sets and the copy of the NP' array adjusted accordingly. N' is now empty so the rule 33 -> 41 is inserted into the rule list.

We now repeat again. The total weighting for P is now 0.89, therefore above the totalWeightThreshold of 0.3. We check that it is possible to produce a gain above the global maximum and discover that this cannot be done so break our of the loop and repeat the entire process for class 42.

At the end of the process we will have the following rule list. Note that rules are inserted into the rule list, as they are derived according to their calculate Laplace accuracy (derived according to each rules Laplace expected error estimate).

(1)  {8}   ->  {41}  0.86%
(2)  {10}  ->  {42}  0.86%
(3)  {38}  ->  {42}  0.83%
(4)  {32}  ->  {42}  0.8%
(5)  {36}  ->  {41}  0.67%
(6)  {33}  ->  {41}  0.67%



5. CONCLUSSIONS


The LUCS-KDD implementation of CPAR described here has been use by the LUCS-KDD reseach group to contrast and compare a variety of CARM algorithms and techniques. The software is available free of charge, however the author would appreciate appropriate acknowledgement. The following reference format for referring to the software is suggested:

Coenen, F. (2004), LUCS-KDD implementations of CPR (Classification based on Predictive Association Rules), http://www.cxc.liv.ac.uk/~frans/KDD/Software/FOIL_PRM_CPAR/cpar.html, Department of Computer Science, The University of Liverpool, UK.

Should you discover any "bugs" or other problems within the software (or this documentation), do not hesitate to contact the author.




REFERENCES

  1. Clark, P. and Boswell. R. (1991). Rule Induction With CN2: Some Recent Improvements. Proc. European Working Session on Learning (ESWL'91), Porto, Portugal. pp151-163.
  2. Quinlan, J. R. and Cameron-Jones, R. M. (1993). FOIL: A Midterm Report. Proc. ECML, Vienna, Austria, pp3-20.
  3. Yin, X. and Han, J. (2003). CPAR: Classification based on Predictive Association Rules. Proc. SIAM Int. Conf. on Data Mining (SDM'03), San Fransisco, CA, pp. 331-335.



Created and maintained by Frans Coenen. Last updated 23 June 2005