SO YOU WANT TO MINE YOUR DATA!

Some guidance notes for people who wish to mine their data but know nothing about programming.



Liverpool University

Frans Coenen

Department of Computer Science

The University of Liverpool


CONTENTS

1. Overview.
2. Note on compiling Java software.
3. Note on running Java software.
 
4. Data preparation.
5. Available data mining applications.
6. Problems.






1. NOTE ON COMPILING SOFTWARE

On this page we present some guidance notes for people who have data they wish to mine but who know nothing about programming (and do not want to know anything about programming).

Data mining is the process of finding interesting, but hidden, patterns in data. As such it is part of a larger process known as Knowledge Discovery in Data or KDD. With respect to the objectives promoted on this WWW page KDD can be considered to be a two stage process:

  1. Data preparation
  2. Data mining

(Purists will tell you that there is more to it than this but we will ignore them.)




2. NOTE ON COMPILING JAVA SOFTWARE

The software we are going to use is written in the Java programming language. Each application consists of at least one, but usually more, source code files (indicated by the extension .java). To run (execute) a Java application you need to take the software (source code), for the particular application in which you are interested, available from this WWW site and load it on to your own computer. To do this decide which particular software application you want to down load, create an appropriately named directory and copy the Java source code into this directory. Once you have got the source code you will need to compile it so that the source code is translated into a form that will run on your particular computer. There is nothing scary about this operation; if you already have Java on your computer do the following (assuming that you are using Microsoft Windows):

  1. Open a "command ptompt" window (you may already have this on your desk top in which case simply double click on the "command prompt" icon).
  2. Go to the hard drive on which the directory you have created is located (you may already be in it). Hard drives are normally indicated by an upper case letter. To move to the appropriate hard drive type the relevant letter followed by a colon in the command prompt window. For example D:.
  3. Go to your directory by using the command line argument cd (change directory). You can list the contents of the directory in which you are located by typing dir or dir/w.
  4. Once you are in the directory in which the java source code you wish to compile is located type the command: Javac *.java. This will produce a set of executable Java Byte Code files indicated by the extension .class.

Your source code will now be compiled.

If Java is not already loaded on to your computer you will first have to download it from the Sun Microsystems WWW site. This is straight forward, go to http://www.java.com/en/download/index.jsp and follow the instructions.




3. NOTE ON RUNNING JAVA SOFTWARE

Computer programs are run (executed) by stepping through a sequence of instructions. Groups of instructions are gathered together to form what are known as procedures. In Java such collections are called methods, each method has a name and the execution process starts with a method called main. There can only be one main method in a Java application no matter how many source code files it is comprised of. To run an application type: java FILE_NAME, where FILE_NAME is the name of the .class file that contains the main method (but do not include the extension .class. The convention on these WWW pages is to indicate the file that contains the main method by including the word App somewhere in the file name. Thus if the main method for is contained in the file myFileApp.class then the application can be run using the command: java myFileApp (note, no .class extension).




4. DATA PREPARATION

The data mining programs presented here operate using what is known as binary valued data. You therefore first need to convert your data into this format. The easiest way to do this is to make use of the LUCS-KDD-DN software available from these pages. Details can be found by following the link:

http://www.csc.liv.ac.uk/~frans/KDD/Software/LUCS-KDD-DN_ARM/lucs-kdd_DN.html

If the data mining you wish to undertake is to generate a classifier then you are better of going to:

http://www.csc.liv.ac.uk/~frans/KDD/Software/LUCS-KDD-DN/lucs-kdd_DN.html




5. AVAILABLE DATA MININMG APPLICATIONS

Currently we only have software available to do Association Rule Mining (ARM) and Classification (we do not have Clustering software available). A full list of the software available can be found at:

http://www.csc.liv.ac.uk/~frans/KDD/Software/

But if you wish to do ARM we recommend the Apriori-T applications available at:

http://www.csc.liv.ac.uk/~frans/KDD/Software/Apriori-T_GUI/aprioriT_GUI.html




6. PROBLEMS

If you have any problems feel free to contact me (email: frans@csc.liv.ac.uk) and I will get back to you as soon as I can.




Created and maintained by Frans Coenen. Last updated 13 March 2007.