SOME NOTES AND EXAMPLE SCHEMA FOR THE LUCS-KDD DN (DISCRETISATION/NORMALISATION) SOFTWARE VERSION 2



Liverpool University

Frans Coenen

Department of Computer Science

The University of Liverpool

Friday 7 January 2005

Revisions and additions: 14 December 2006, 21 Februray 2008

This page contains some further notes on using the LUCS-KDD (Liverpool University Computer Science - Knowledge Discovery in Data) DN (discretization/ normalisation) software Version 2. More specifically this page includes:

  1. Notes on processing a number of data sets (available within the UCI data repository ([1]) as used by the LUCS-KDD research team for a variety of experiments,
  2. Suggests schema files for these data sets, and
  3. Statistical information on the processed data sets. Where measurements differ from the data produced using version 1 of the normalisation/discretisation software the version 1 data is given in parenthesis. (In most cases the version 2 produces less attribute columns than version 1).

A number of example discretised/normalised data sets, taken from the UCI library, are available at:

http://csc.liv.ac.uk/~frans/KDD/Software/LUCS-KDD-DN/DataSets/dataSets.html

CONTENTS

1. Adult.
2. Annealing.
3. Auto.
4. Breast.
5. Cae evaluation.
6. Chess (king and rook v. king).
7. Congressional voting.
8. Connect-4.
9. Cylinder Bands.
10. Dematology.
11. Ecoli.
 
12. Flare.
13. Glass.
14. Heart.
15. Hepatitis.
16. Horse colic.
17. Ionosphere.
18. Iris.
19. Labour.
20. Led 7.
21. Letter recognition.
22. Lymphography (Restricted Access).
 
23. Mushroom.
24. Nursery.
25. Page blocks.
26. Pen digits.
27. Pima Indians.
28. Soybean Large.
29. Tic-tac-toe.
30. Waveform.
31. Wine.
32. Zoo.



1. ADULT

SCHEMA FILE
int nominal int nominal int nominal nominal nominal nominal nominal int int int nominal nominal
age workclass fnlwgt education education-num marital-status occupation relationship race sex capital-gain capital-loss hours-per-week native-country class
none Private/Self-emp-not-inc/Self-emp-inc/Federal-gov/Local-gov/State-gov/ Without-pay/Never-worked none Bachelors/Some-college/11th/HS-grad/Prof-school/Assoc-acdm/Assoc-voc/ 9th/7th-8th/12th/Masters/1st-4th/10th/Doctorate/5th-6th/Preschool none Married-civ-spouse/Divorced/Never-married/Separated/Widowed/ Married-spouse-absent/Married-AF-spouse Tech-support/Craft-repair/Other-service/Sales/Exec-managerial/Prof-specialty/ Handlers-cleaners/Machine-op-inspct/Adm-clerical/Farming-fishing/ Transport-moving/Priv-house-serv/Protective-serv/Armed-Forces Wife/Own-child/Husband/Not-in-family/Other-relative/Unmarried White/Asian-Pac-Islander/Amer-Indian-Eskimo/Other/Black Female/Male none none none United-States/Cambodia/England/Puerto-Rico/Canada/Germany/ Outlying-US(Guam-USVI-etc)/India/Japan/Greece/South/China/Cuba/Iran/ Honduras/Philippines/Italy/Poland/Jamaica/Vietnam/Mexico/Portugal/Ireland/ France/Dominican-Republic/Laos/Ecuador/Taiwan/Haiti/Columbia/Hungary/Guatemala/ Nicaragua/Scotland/Thailand/Yugoslavia/El-Salvador/Trinadad&Tobago/Peru/Hong/ Holand-Netherlands >50K/<=50K

DN STATISTICS
Num. divs. setting 5
Distributed/Randomised Yes
Missing values 6465
Number of records 48842
Num. input columns 15
Num. out cols. (Attributes) (Ver 1)97 (131)
Density % (Ver 1) 15.46 (11.45)
Number of classes 2
Num. records per class:
ClassNum. Rec.%
961168723.93
973715576.07
File name adult.D97.N48842.C2.num



2. ANNEALING

SCHEMA FILE
nominal nominal nominal double double nominal nominal int double nominal nominal nominal int nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal double double double nominal nominal int nominal
family product-type steel carbon hardness temper_rolling condition formability strength non-ageing surface-finish surface-quality enamelability bc bf bt bw/me bl m chrom phos cbond marvi exptl ferro corr blue/bright/varn/clean lustre jurofm s p shape thick width len oil bore packing classes
GB/GK/GS/TN/ZA/ZF/ZH/ZM/ZS C/H/G R/A/U/K/M/S/W/V null null T S/A/X null nul l N P/M D/E/F/G null Y Y Y B/M Y Y C P Y Y Y Y Y B/R/V/C Y Y Y Y COIL/SHEET null null null Y/N 0000/0500/0600/0760 null 1/2/3/4/5/U

DN STATISTICS
Num. divs. setting 5
Distributed/Randomised Yes
Missing values 22175
Number of records (Ver 1) 898 (798)
Num. input columns 39
Num. out cols. (Attributes) (Ver 1)73 (106)
Density % (Ver 1) 53.42 (41.05)
Number of classes 6
Num. records per class:
ClassNum. Rec.%
710 (0) 0.00 (0.00)
688 (8) 0.89 (1.00)
7340 (34) 4.45 (4.26)
7267 (60) 7.56 (7.52)
6999 (88) 11.02 (11.03)
70684 (608)76.17 (76.19)
File name anneal.D73.N898.C6.num



3. AUTO

SCHEMA FILE
nominal double nominal nominal nominal nominal nominal nominal nominal double double double double double nominal nominal double nominal double double double double double double double double
symboling normalized-losses make fuel-type aspiration num-of-doors body-style drive-wheels engine-location wheel-base length width height curb-weight engine-type num-of-cylinders engine-size fuel-system bore str
-3/-2/-1/0/1/2/3 null alfa-romero/audi/bmw/chevrolet/dodge/honda/isuzu/ jaguar/mazda/mercedes-benz/mercury/mitsubishi/nissan/peugot/plymouth/ porsche/renault/saab/subaru/toyota/volkswagen/volvo diesel/gas std/turbo four/two hardtop/wagon/sedan/hatchback/convertible 4wd/fwd/rwd front/rear null null null null null dohc/dohcv/l/ohc/ohcf/ohcv/rotor eight/five/four/six/three/twelve/two null 1bbl/2bbl/4bbl/idi/mfi/mpfi/spdi/spfi null null null null null null null null

DN STATISTICS
Numner of divisions 5
Distributed/Randomised Yes
Missing values 59
Number of records 205
Num. input columns 26
Num. out cols. (Attributes) (Ver 1)137 (142)
Density % (Ver 1) 18.98 (18.31)
Number of classes 7
Num. records per class:
ClassNum. Rec.%
131 0 0.00
132 3 1.46
1332210.73
1472713.17
1463215.61
1455426.34
1346732.68
File name auto.D137.N205.C7.num



4. BREAST (Wisconsin)

SCHEMA FILE
int int int int int int int int int int nominal
number ClumpThickness UniformityOfCellSize UniformityOfCellShape MarginalAdhesion SingleEpithelialCellSize BareNuclei BlandChromatin NormalNucleoli Mitoses Class
null null null null null null null null null null 2/4

DN STATISTICS
Num. divs. setting 5
Distributed/Randomised Yes
Missing values 16
Number of records 699
Num. input columns 11 (Remove 1)
Num. out cols. (Attributes) (Ver 1)20 (47)
Density % (Ver 1) 50 (21.28)
Number of classes 2
Num. records per class:
ClassNum. Rec.%
20 241 34.48
19 458 65.52
File name breast.D20.N699.C2.num



5. CAR EVALUATION

SCHEMA FILE
nominal nominal nominal nominal nominal nominal nominal
buying maint doors persons lug_boot safety class
vhigh/high/med/low vhigh/high/med/low 2/3/4/5more 2/4/more small/med/big low/med/high unacc/acc/good/vgood

DN STATISTICS
Numner of divisions 5
Distributed/Randomised Yes
Missing values 0
Number of records 1728
Num. input columns 7
Num. out cols. (Attributes)25
Density % (Ver 1) 28.0%
Number of classes 4
Num. records per class:
ClassNum. Rec.%
25 65 3.76
24 69 3.99
23 38422.22
22121070.02
File name car.D25.N1728.C4.num



6. CHESS (KING AND ROOK v. KING)

SCHEMA FILE
nominal nominal nominal nominal nominal nominal nominal
White_King_file White_King_rank White_Rook_file White_Rook_rank Black_King_file Black_King_rank depth-of-win
a/b/c/d/e/f/g/h 1/2/3/4/5/6/7/8 a/b/c/d/e/f/g/h 1/2/3/4/5/6/7/8 a/b/c/d/e/f/g/h 1/2/3/4/5/6/7/8 draw/zero/one/two/three/four/five/six/seven/ eight/nine/ten/eleven/twelve/thirteen/fourteen/fifteen/sixteen

DN STATISTICS
Num. divs. setting 5
Distributed/Randomised Yes
Missing values 0
Number of records 28056
Num. input columns 7
Num. out cols. (Attributes) (Ver 1)58 (66)
Density % (Ver 1) 12.07 (10.61)
Number of classes 18
Num. records per class:
ClassNum. Rec.%
42 27 0.1
43 78 0.28
45 81 0.29
46 198 0.71
44 246 0.88
58 390 1.39
47 471 1.68
48 592 2.11
49 683 2.43
501433 5.11
511712 6.1
521985 7.08
572166 7.72
412796 9.97
53285410.17
54359712.82
55419414.95
56455316.23
File name chessKRvK.D58.N28056.C18.num



7. CONGRESSIONAL VOTING


SCHEMA FILE
nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal
ClassName handicapped-infant water-project-cost-sharing adoption-of-the-budget-resolution physician-fee-freeze el-salvador-aid religious-groups-in-schools anti-satellite-test-ban aid-to-nicaraguan-contras mx-missile immigration synfuels-corporation-cutback education-spending superfund-right-to-sue crime duty-free-exports export-administration-act-south-africa
democrat/republican y/n y/n y/n y/n y/n y/n y/n y/n y/n y/n y/n y/n y/n y/n y/n y/n

DN STATISTICS
Num. divs. setting 5
Distributed/Randomised Yes
Missing values 392
Number of records 435
Num. input columns 17
Num. out cols. (Attributes) 34
Density % 50.0
Number of classes 2
Num. records per class:
ClassNum. Rec.%
3416838.62
3326761.38
File name congres.D34.N435.C2.num



8. CONNECT-4


SCHEMA FILE
nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal
a1 a2 a3 a4 a5 a6 b1 b2 b3 b4 b5 b6 c1 c2 c3 c4 c5 c6 d1 d2 d3 d4 d5 d6 e1 e2 e3 e4 e5 e6 f1 f2 f3 f4 f5 f6 g1 g2 g3 g4 g5 g6 Class
x/o/b x/o/b x/o/b x/o/b x/o/b x/o/b x/o/b x/o/b x/o/b x/o/b x/o/b x/o/b x/o/b x/o/b x/o/b x/o/b x/o/b x/o/b x/o/b x/o/b x/o/b x/o/b x/o/b x/o/b x/o/b x/o/b x/o/b x/o/b x/o/b x/o/b x/o/b x/o/b x/o/b x/o/b x/o/b x/o/b x/o/b x/o/b x/o/b x/o/b x/o/b x/o/b win/loss/draw

DN STATISTICS
Num. divs. setting 5
Distributed/Randomised Yes
Missing values 0
Number of records 67557
Num. input columns 43
Num. out cols. (Attributes) (Ver 1)129 (129)
Density % (Ver 1) 33.33 (33.33)
Number of classes 3
Num. records per class:
ClassNum. Rec.%
129 6449 9.55
1281663524.62
1274447365.83
File name connect4.D129.N67557.C3.num



9. CYLINDER BANDS

SCHEMA FILE
unused unused unused unused nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal int int double int int double int double int double double int int double int int int int int nominal
timestamp cylinderNumber customer jobNumber grainScreened color proofOnCtdInk bladeMFG cylinderDivision paperType inkType directSteam solventType typeOnCylinder pressType press unitNumber cylinderSize paperMillLocation platingTank proofCut viscosity caliper inkTemperature humidity roughness bladePressure varnishPCT pressSpeed inkPCT solventPCT ESAvoltage ESAamperage wax hardener rollerDurometer currentDensity anodeSpaceEatio chromeContent bandType
none none none none yes/no key/type yes/no benton/daetwyler/uddeholm gallatin/warsaw/mattoon uncoated/coated/super uncoated/coated/cover yes/no xylol/lactol/naptha/line/other yes/no WoodHoe70/Motter70/Albert70/Motter94 821/802/813/824/815/816/827/828 1/2/3/4/5/6/7/8/9/10 catalog/spiegel/tabloid NorthUS/SouthUS/Canadian/Scandanavian/MidEuropean 1910/1911/other none none none none none none none none none none none none none none none none none none none band/noband

DN STATISTICS
Num. divs. setting 5
Distributed/Randomised Yes
Missing values 999
Number of records 540
Num. input columns 40 (remove 4)
Num. out cols. (Attributes) 124
Density % 29.03
Number of classes 2
Num. records per class:
ClassNum. Rec.%
12322842.22
12431257.78
File name cylBands.D124.N540.C2.num



10. DEMATOLOGY

SCHEMA FILE
int nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal
age erythema scaling definite_borders itching kobner_phenomenon polygonal_papules follicular_papules oral_mucosal_inv knee_and_elbow_inv scalp_inv family_hist diagnosis
null 0/1/2/3 0/1/2/3 0/1/2/3 0/1/2/3 0/1/2/3 0/1/2/3 0/1/2/3 0/1/2/3 0/1/2/3 0/1/2/3 0/1 LichenPlanus/Psoriasis/Seboretic/CronicDermatitis/ PityriasisRosea/PityriasisRubra

DN STATISTICS
Num. divs. setting 5
Distributed/Randomised Yes/No
Missing values 8
Number of records 366
Num. input columns 13
Num. out cols. (Attributes) 49
Density % 26.53
Number of classes 6
Num. records per class:
ClassNum. Rec.%
49 20 5.46%
48 4913.39%
47 5214.21%
46 6116.67%
44 7219.67%
4511230.60%
File name dematology.D49.N366.C6.num



11. ECOLI

SCHEMA FILE
unused double double double double double double double nominal
SequenceName mcg gvh lip chg aac alm1 alm2 Class
none none none none none none none none cp/im/pp/imU/om/omL/imL/imS

DN STATISTICS
Num. divs. setting 5
Distributed/Randomised Yes
Missing values 0
Number of records 336
Num. input columns 9 (remove 1)
Num. out cols. (Attributes) 34
Density % 23.53
Number of classes 8
Num. records per class:
ClassNum. Rec.%
33 2 0.60
34 2 0.60
32 5 1.49
31 20 5.95
30 35 10.42
29 52 15.48
28 77 22.92
27 143 45.56
File name ecoli.D34.N336.C8.num

12. FLARE

SCHEMA FILE
nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal
modifiedZurichClass largestSpotSize spotDistribution activity evolution prev24hourFlareActivity historically-complex regionBecameHistComplex area areaLargestSpot C-class M-class X-class
A/B/C/D/E/F/H X/R/S/A/H/K X/O/I/C 1/2 1/2/3 1/2/3 1/2 1/2 1/2 1/2 0/1/2/3/4/5/6/7/8 0/1/2/3/4/5 0/1/2

DN STATISTICS
Num. divs. setting 5
Distributed/Randomised Yes
Missing values 0
Number of records 1389
Num. input columns 13 (remove 2)
Num. out cols. (Attributes) 39
Density % 28.21
Number of classes 9
Num. records per class:
ClassNum. Rec.%
38 0 0.00
39 1 0.07
37 3 0.22
36 4 0.29
35 9 0.65
34 20 1.44
33 40 2.88
32 141 10.15
31 1171 84.31
File name flare.D39.N1389.C9.num

If using column 13 as the class attribute the class distribution is:

ClassNum. Rec.%
22 1 0.07
21 11 0.79
20137799.14

And if using column 12 as the class attribute the distribution is:

ClassNum. Rec.%
30 1 0.07
28 2 0.14
29 3 0.22
27 9 0.65
26 53 3.82
25132195.10



13. GLASS

SCHEMA FILE
int double double double double double double double double double nominal
number RI:refractiveIndex Na:Sodium Mg:Magnesium Al:Aluminum Si:Silicon K:Potassium Ca:Calcium Ba:Barium Fe:Iron TypeOfGlass
none none none none none none none none none none 1/2/3/4/5/6/7

DN STATISTICS
Num. divs. setting 5
Distributed/Randomised Yes
Missing values 0
Number of records 214
Num. input columns 11 (remove 1)
Num. out cols. (Attributes) (Ver 1)48 (52)
Density % (Ver 1) 20.83 (19.23)
Number of classes 7
Num. records per class:
ClassNum. Rec.%
45 0 0.00
47 9 4.21
4613 6.07
4417 7.94
4829 13.55
4270 32.71
4376 35.51
File name glass.D48.N214.C7.num



14. HEART

SCHEMA FILE
double nominal nominal double double nominal nominal double nominal double nominal nominal nominal nominal
age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal num
null 0.0/1.0 1.0/2.0/3.0/4.0 null null 0.0/1.0 0.0/1.0/2.0 null 0.0/1.0 null 1.0/2.0/3.0 0.0/1.0/2.0/3.0 3.0/6.0/7.0 0/1/2/3/4

DN STATISTICS
Num. divs. setting 5
Distributed/Randomised Yes
Missing values 6
Number of records 303
Num. input columns 14
Num. out cols. (Attributes) (Ver 1)52 (53)
Density % (Ver 1) 26.92 (26.42)
Number of classes 5
Num. records per class:
ClassNum. Rec.%
52 13 4.29
51 3511.55
50 3611.88
49 5518.15
4816454.13
File name heart.D52.N303.C5.num



15. HEPATITUS

SCHEMA FILE
nominal int nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal double int int double int nominal
Class AGE SEX STEROID ANTIVIRALS FATIGUE MALAISE ANOREXIA LIVER_BIG LIVER_FIRM SPLEEN_PALPABLE SPIDERS ASCITES VARICES BILIRUBIN ALK_PHOSPHATE SGOT ALBUMIN PROTIME HISTOLOGY
1/2 null 1/2 1/2 1/2 1/2 1/2 1/2 1/2 1/2 1/2 1/2 1/2 1/2 null null null null null 1/2

DN STATISTICS
Num. divs. setting 5
Distributed/Randomised Yes
Missing values 167
Number of records 155
Num. input columns 20
Num. out cols. (Attributes) (Ver 1)56 (58)
Density % (Ver 1) 35.71 (34.48)
Number of classes 2
Num. records per class:
ClassNum. Rec.%
55 32 20.65
56123 79.35
File name hepatitis.D56.N155.C2.num



16. HORSE COLIC

SCHEMA FILE
nominal nominal int double int int nominal nominal nominal nominal nominal nominal nominal nominal nominal int nominal nominal double double nominal int nominal nominal int int int nominal
surgery? Age HospitalNumber rectalTemperature pulse respiratoryRate temperatureOfExtremities peripheralPulse mucousMembranes capillaryRefillTime pain peristalsis abdominalDistension nasogastricTube nasogastricReflux nasogastricRefluxPH ectalExamination abdomen packedCellVolume totalProtein abdominocentesisAppearanc abdomcentesisTotalProtein outcome surgicalLesion? typeOfLesion1 typeOfLesion2 typeOfLesion3 cp_data
1/2 1/9 null null null null 1/2/3/4 1/2/3/4 1/2/3/4/5/6 1/2/3 1/2/3/4/5 1/2/3/4 1/2/3/4 1/2/3 1/2/3 null 1/2/3/4 1/2/3/4/5 null null 1/2/3 null 1/2/3 1/2 null null null 1/2

DN STATISTICS
Num. divs. setting 5
Distributed/Randomised Yes
Missing values 1927
Number of records 368
Num. input columns 28
Num. out cols. (Attributes) (Ver 1)85 (94)
Density % (Ver 1) 27.06 (24.47)
Number of classes 2
Num. records per class:
ClassNum. Rec.%
85 136 36.96
84 232 63.04
File name horseColic.D85.D368.C2.num



17. IONOSPHERE


SCHEMA FILE
double double double double double double double double double double double double double double double double double double double double double double double double double double double double double double double double double double nominal
att1 att2 att3 att4 att5 att6 att7 att8 att9 att10 att11 att12 att13 att14 att15 att16 att17 att18 att19 att20 att21 att22 att23 att24 att25 att26 att27 att28 att29 att30 att31 att32 att33 att34 class
null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null g/b

DN STATISTICS
Num. divs. setting 5
Distributed/Randomised Yes
Missing values 0
Number of records 351
Num. input columns 35
Num. out cols. (Attributes) (Ver 1)157 (172)
Density % (Ver 1) 22.29 (?)
Number of classes 2
Num. records per class:
ClassNum. Rec.%
157 126 35.90
156 225 64.10
File name ionosphere.D157.N351.C2.num



18. IRIS

SCHEMA FILE
double double double double nominal
sepalLength sepalWidth petalLength petalWidth class
null null null null Iris-setosa Iris-versicolour Iris-virginica

DN STATISTICS
Num. divs. setting 5
Distributed/Randomised Yes
Missing values 0
Number of records 150
Num. input columns 5
Num. out cols. (Attributes) (Ver 1)19 (23)
Density % (Ver 1) 26.32 (21.74)
Number of classes 3
Num. records per class:
ClassNum. Rec.%
17 50 33.33
18 50 33.33
19 50 33.33
File name iris.D19.N150.C3.num



19. LABOUR




20. LED 7

SCHEMA FILE
nominal nominal nominal nominal nominal nominal nominal nominal
light1 light2 light3 light4 light5 light6 light7 class
0/1 0/1 0/1 0/1 0/1 0/1 0/1 0/1/2/3/4/5/6/7/8/9

DN STATISTICS
Num. divs. setting 5
Distributed/Randomised Yes
Missing values 0
Number of records 3200
Num. input columns 8
Num. out cols. (Attributes) (Ver 1)24 (24)
Density % (Ver 1) 33.33 (33.33)
Number of classes 10
Num. records per class:
ClassNum. Rec.%
21 301 9.41
18 307 9.59
19 312 9.75
17 313 9.78
20 313 9.78
22 314 9.81
23 327 10.22
15 329 10.28
24 334 10.44
16 350 10.94
File name led7.D24.N3200.C10.num



21. LETTER RECOGNITION

SCHEMA FILE
nominal int int int int int int int int int int int int int int int int
lettr x-box y-box width high onpix x-bar y-bar x2bar y2bar xybar x2ybr xy2br x-ege xegvy y-ege yegvx
A/B/C/D/E/F/G/H/I/J/K/L/M/N/O/P/Q/R/S/T/U/V/W/X/Y/Z none none none none none none none none none none none none none none none none

DN STATISTICS
Num. divs. setting 5
Distributed/Randomised Yes
Missing values 0
Number of records 20000
Num. input columns 17
Num. out cols. (Attributes) (Ver 1)106 (106)
Density % (Ver 1) 16.04 (16.04)
Number of classes 26
Num. records per class:
ClassNum. Rec.%
887343.67
1067343.67
837363.68
917393.70
907473.74
997483.74
1037523.76
957533.77
897553.78
987583.79
927613.81
1027643.82
827663.83
857683.84
877733.87
867753.88
947833.92
977833.92
1057863.93
1047873.94
817893.95
937923.96
1007963.98
968034.01
848054.03
1018134.07
File name letRecog.D106.N20000.C26.num



22. LYMPOGRAPHY

SCHEMA FILE
nominal nominal nominal nominal nominal nominal nominal nominal nominal int int nominal nominal nominal nominal nominal nominal nominal nominal
class lymphatics blockOfAffere blOfLymph_c blOfLymph_s byPass extravasates regeneration earlyUptakeIn lym.nodesDimin lym.nodesEnlar changesInLym defectInNode changesInNode changesInStru specialForms dislocationOf exclusionOfNo noOfNodesIn
1/2/3/4 1/2/3/4 1/2 1/2 1/2 1/2 1/2 1/2 1/2 none none 1/2/3 1/2/3/4 1/2/3/4 1/2/3/4/5/6/7/8 1/2/3 1/2 1/2 1/2/3/4/5/6/7/8

DN STATISTICS
Num. divs. setting 5
Distributed/Randomised Yes
Missing values 0
Number of records 148
Num. input columns 19
Num. out cols. (Attributes)59
Density % (Ver 1) 32.2
Number of classes 4
Num. records per class:
ClassNum. Rec.%
56 2 1.35
59 4 2.70
586141.22
578154.73
File name lymphography.D59.N148.C4.num



23. MUSHROOM

SCHEMA FILE
nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal
class cap-shape cap-surface cap-color bruises? odor gill-attachment gill-spacing gill-size gill-color stalk-shape stalk-root stalk-surface-above-ring stalk-surface-below-ring stalk-color-above-ring stalk-color-below-ring veil-type veil-color ring-number ring-type spore-print-color population habitat
e/p b/c/x/f/k/s f/g/y/s n/b/c/g/r/p/u/e/w/y t/f a/l/c/y/f/m/n/p/s a/d/f/n c/w/d b/n k/n/b/h/g/r/o/p/u/e/w/y e/t b/c/u/e/z/r f/y/k/s f/y/k/s n/b/c/g/o/p/e/w/y n/b/c/g/o/p/e/w/y p/u n/o/w/y n/o/t c/e/f/l/n/p/s/z k/n/b/h/r/o/u/w/y a/c/n/s/v/y g/l/m/p/u/w/d

DN STATISTICS
Num. divs. setting 5
Distributed/Randomised Yes
Missing values 2480
Number of records 8124
Num. input columns 23
Num. out cols. (Attributes) (Ver 1)90 (127)
Density % (Ver 1) 25.56 (18.11)
Number of classes 2
Num. records per class:
ClassNum. Rec.%
90 3916 48.20
89 4208 51.80
File name mushroom.D90.N8124.C2.num



24. NURSERY

SCHEMA FILE
nominal nominal nominal nominal nominal nominal nominal nominal nominal
parents has_nurs form children housing finance social health class
usual/pretentious/great_pret proper/less_proper/improper/critical/very_crit complete/completed/incomplete/foster 1/2/3/more convenient/less_conv/critical convenient/inconv nonprob/slightly_prob/problematic recommended/priority/not_recom not_recom/recommend/very_recom/priority/spec_prior

DN STATISTICS
Num. divs. setting 5
Distributed/Randomised Yes
Missing values 0
Number of records 12960
Num. input columns 9
Num. out cols. (Attributes) (Ver 1)32 (32)
Density % (Ver 1) 28.13 (28.13)
Number of classes 5
Num. records per class:
ClassNum. Rec.%
29 2 0.02
30 328 2.53
32 4044 31.20
31 4266 32.92
28 4320 33.33
File name nursery.D32.N12960.C5.num



25. PAGE BLOCKS

SCHEMA FILE
int int int double double double double int int int nominal
height lenght area eccen p_black p_and mean_tr blackpix blackand wb_trans class
null null null null null null null null null null 1/2/3/4/5

DN STATISTICS
Num. divs. setting 5
Distributed/Randomised Yes
Missing values 0
Number of records 5473
Num. input columns 11
Num. out cols. (Attributes) (Ver 1)46 (55)
Density % (Ver 1) 23.91 (20.00)
Number of classes 5
Num. records per class:
ClassNum. Rec.%
44 28 0.51
45 88 1.61
46 115 2.10
43 329 6.01
42 4913 89.77
File name pageBlocks.D46.N5473.C5.num



26. PEN DIGITS

SCHEMA FILE
int int int int int int int int int int int int int int int int nominal
att1 att2 att3 att4 att5 att6 att7 att8 att9 att10 att11 att12 att13 att14 att15 att16 class
null null null null null null null null null null null null null null null null 0/1/2/3/4/5/6/7/8/9

DN STATISTICS
Num. divs. setting 5
Distributed/Randomised Yes
Missing values 0
Number of records 10992
Num. input columns 17
Num. out cols. (Attributes) (Ver 1)89 (90)
Density % (Ver 1) 19.10 (18.89)
Number of classes 10
Num. records per class:
ClassNum. Rec.%
83 1055 9.60
85 1055 9.60
88 1055 9.60
89 1055 9.60
86 1056 9.61
87 114210.39
80 114310.40
81 114310.40
82 114410.41
84 114410.41
File name penDigits.D89.N10992.C10.num



27. PIMA INDIANS


SCHEMA FILE
int int int int int double double int nominal
NumberPregnacies PlasmaGluConcent DiastolicBldPress TricepsSkinFold 2-HourSerumIns BodyMassIndex DiabPedFunc Age Class
none none none none none none none none 0/1

DN STATISTICS
Num. divs. setting 5
Distributed/Randomised Yes
Missing values 0
Number of records 768
Num. input columns 9
Num. out cols. (Attributes) (Ver 1)38 (42)
Density % (Ver 1) 23.68 (21.43)
Number of classes 2
Num. records per class:
ClassNum. Rec.%
38 268 34.90
37 500 65.10
File name pima.D38.N768.C2.num



28. SOYBEAN LARGE

SCHEMA FILE
nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal
class date plant-stand precip temp hail crop-hist area-damaged severity seed-tmt germination plant-growth leaves leafspots-halo leafspots-marg leafspot-size leaf-shread leaf-malf leaf-mild stem lodging stem-cankers canker-lesion fruiting-bodies external-decay mycelium int-discolor sclerotia fruit-pods fruit seed mold-growth seed-discolor seed-size shriveling roots
diaporthe-stem-canker/charcoal-rot/rhizoctonia-root-rot/phytophthora-rot/ brown-stem-rot/powdery-mildew/downy-mildew/brown-spot/bacterial-blight/ bacterial-pustule/purple-seed-stain/anthracnose/phyllosticta-leaf-spot/ alternarialeaf-spot/frog-eye-leaf-spot/diaporthe-pod-&-stem-blight/ cyst-nematode/2-4-d-injury/herbicide-injury 0/1/2/3/4/5/6 0/1 0/1/2 0/1/2 0/1 0/1/2/3 0/1/2/3 0/1/2 0/1/2 0/1/2 0/1 0/1 0/1/2 0/1/2 0/1/2 0/1 0/1 0/1/2 0/1/2 0/1 0/1/2/3 0/1/2/3 0/1 0/1/2 0/1 0/1/2 0/1 0/1/2/3 0/1/2/3/4 0/1 0/1 0/1 0/1 0/1 0/1/2

DN STATISTICS
Num. divs. setting 5
Distributed/Randomised Yes
Missing values 2337
Number of records 683
Num. input columns 36
Num. out cols. (Attributes) 118
Density % 30.51
Number of classes 19
Num. records per class:
ClassNum. Rec.%
118 81.17
116142.05
115152.2
117162.34
100202.93
101202.93
102202.93
105202.93
106202.93
108202.93
109202.93
111202.93
112202.93
104446.44
111446.44
1038812.88
1139113.32
1149113.32
1079213.47
File name soybean-large.D118.N683.C19.num



29. TIC-TAC-TOE

SCHEMA FILE
nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal
top-left-square top-middle-square top-right-square middle-left-square middle-middle-square middle-right-square bottom-left-square bottom-middle-square bottom-right-square Class
x/o/b x/o/b x/o/b x/o/b x/o/b x/o/b x/o/b x/o/b x/o/b positive/negative

DN STATISTICS
Num. divs. setting 5
Distributed/Randomised Yes
Missing values 0
Number of records 958
Num. input columns 10
Num. out cols. (Attributes) (Ver 1)29 (29)
Density % (Ver 1) 34.48 (34.48)
Number of classes 2
Num. records per class:
ClassNum. Rec.%
29 332 34.66
28 626 65.34
File name ticTacToe.D29.N958.C2.num



30. WAVEFORM


SCHEMA FILE
double double double double double double double double double double double double double double double double double double double double double nominal
att1 att2 att3 att4 att5 att6 att7 att8 att9 att10 att11 att12 att13 att14 att15 att16 att17 att18 att19 att20 att21 class
none none none none none none none none none none none none none none none none none none none none none 0/1/2

DN STATISTICS
Num. divs. setting 5
Distributed/Randomised Yes
Missing values 0
Number of records 5000
Num. input columns 22
Num. out cols. (Attributes) (Ver 1)101 (108)
Density % (Ver 1) 21.78 (20.37)
Number of classes 3
Num. records per class:
ClassNum. Rec.%
100 1647 32.94
99 1657 33.14
101 1696 33.92
File name waveform.D101.N5000.C3.num



31. WINE

SCHEMA FILE
nominal double double double double int double double double double double double double int
Class Alcohol MalicAcid Ash AlcalinityOfAsh Magnesium TotalPhenols Flavanoids NonflavanoidPhenols Proanthocyanins ColorIntensity Hue OD280/OD315ofDilutedWines Proline
1/2/3 null null null null null null null null null null null null null

DN STATISTICS
Num. divs. setting 5
Distributed/Randomised Yes
Missing values 0
Number of records 178
Num. input columns 14
Num. out cols. (Attributes) (Ver 1)68 (68)
Density % (Ver 1) 20.59 (20.59)
Number of classes 3
Num. records per class:
ClassNum. Rec.%
68 48 26.97
66 59 33.15
67 71 39.89
File name wine.D68.N178.C3.num



32. ZOO

SCHEMA FILE
unused nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal
name hair feathers eggs milk airborne aquatic predator toothed backbone breathes venomous fins legs tail domestic catsize type
null 0/1 0/1 0/1 0/1 0/1 0/1 0/1 0/1 0/1 0/1 0/1 0/1 0/2/4/5/6/8 0/1 0/1 0/1 1/2/3/4/5/6/7

DN STATISTICS
Num. divs. setting 5
Distributed/Randomised Yes
Missing values 0
Number of records 101
Num. input columns 18
Num. out cols. (Attributes) (Ver 1)42 (43)
Density % (Ver 1) 40.48 (39.53)
Number of classes 7
Num. records per class:
ClassNum. Rec.%
40 4 3.96
38 5 4.95
41 8 7.92
4210 9.90
391312.87
372019.80
364140.59
File name zoo.D42.N101.C7.num



Created and maintained by Frans Coenen. Last updated 03 March 2008