|
|
|
1. INTRODUCTION |
From this page a number (27) data sets mostly taken from the UCI machine learning repository [1] and discretised using the LUCS-KDD DN software [2] have been made available. Where a dataset has been obtained from elsewhere this is indicated. Discretisation has been carried out assuming a maximum of 5 "divisions". Mote that all ther duscretised files have been "zipped" using gzip. The files are intended for use with Association Rule Mining (ARM) and Classification Association Rule Mining (CARM) software (but may well have further uses) which require binary valued input data.
In each case the file names describe the key characteristics of each data set, in the form which it was discretised. For example, the label adult.D131.N48842.C2 denotes the "adult" data set, which includes 48842 records in 2 classes, with attributes that for the experiments described here have been discretised into 131 binary categories. Details of the discretisation in each case are available.
If you make use of these discretised data sets the author would appreciate appropriate acknowledgement. The following reference format for referring to this page is suggested:
Coenen, F. (2003), The LUCS-KDD Discretised/normalised ARM and CARM Data Library, http://www.csc.liv.ac.uk/~frans/KDD/Software/LUCS_KDD_DN/, Department of Computer Science, The University of Liverpool, UK.
2. DATA SETS |
The available datasets are as follows.
A "tarball" dataSets.tgz (1.2 MBytes) containing all the above (except the lymphography set) is also available. To unpack the "tarball" use the linux command:
Then use gunzip to unzip undividual files. Please contact me if any problems are encountered.
Created and maintained by Frans Coenen. Last updated 03 March 2008