All Data Sets


Export   


Deprecated: Function split() is deprecated in /home/cgrate/www/data_access/qs_functions.php on line 275

Deprecated: Function split() is deprecated in /home/cgrate/www/data_access/qs_functions.php on line 281

Deprecated: Function split() is deprecated in /home/cgrate/www/data_access/qs_functions.php on line 287

Deprecated: Function split() is deprecated in /home/cgrate/www/data_access/qs_functions.php on line 275

Deprecated: Function split() is deprecated in /home/cgrate/www/data_access/qs_functions.php on line 281

Deprecated: Function split() is deprecated in /home/cgrate/www/data_access/qs_functions.php on line 287

Deprecated: Function split() is deprecated in /home/cgrate/www/data_access/qs_functions.php on line 275

Deprecated: Function split() is deprecated in /home/cgrate/www/data_access/qs_functions.php on line 281

Deprecated: Function split() is deprecated in /home/cgrate/www/data_access/qs_functions.php on line 287

Deprecated: Function split() is deprecated in /home/cgrate/www/data_access/qs_functions.php on line 275

Deprecated: Function split() is deprecated in /home/cgrate/www/data_access/qs_functions.php on line 281

Deprecated: Function split() is deprecated in /home/cgrate/www/data_access/qs_functions.php on line 287

Deprecated: Function split() is deprecated in /home/cgrate/www/data_access/qs_functions.php on line 275

Deprecated: Function split() is deprecated in /home/cgrate/www/data_access/qs_functions.php on line 281

Deprecated: Function split() is deprecated in /home/cgrate/www/data_access/qs_functions.php on line 287

Deprecated: Function split() is deprecated in /home/cgrate/www/data_access/qs_functions.php on line 275

Deprecated: Function split() is deprecated in /home/cgrate/www/data_access/qs_functions.php on line 281

Deprecated: Function split() is deprecated in /home/cgrate/www/data_access/qs_functions.php on line 287

Deprecated: Function split() is deprecated in /home/cgrate/www/data_access/qs_functions.php on line 275

Deprecated: Function split() is deprecated in /home/cgrate/www/data_access/qs_functions.php on line 281

Deprecated: Function split() is deprecated in /home/cgrate/www/data_access/qs_functions.php on line 287
 Name   Description   Date  Download (Internal) Download (External) More... Delete Edit
KDD Cup 1999 DataThis is the data set used for The Third International Knowledge Discovery and Data Mining Tools Competition, which was held in conjunction with KDD-99 The Fifth International Conference on Knowledge Discovery and Data Mining. The competition task was to build a network intrusion detector, a predictive model capable of distinguishing between ``bad'' connections, called intrusions or attacks, and ``good'' normal connections. This database contains a standard set of data to be audited, which includes a wide variety of intrusions simulated in a military network environment.http://www.cs.purdue.edu/commugrate/data/kddcup/1999/http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
KDD Cup 1998 DataThis is the data set used for The Second International Knowledge Discovery and Data Mining Tools Competition, which was held in conjunction with KDD-98 The Fourth International Conference on Knowledge Discovery and Data Mining. The competition task is a regression problem where the goal is to estimate the return from a direct mailing in order to maximize donation profits. 01/01/1998http://www.cs.purdue.edu/commugrate/data/kddcup/1998/http://kdd.ics.uci.edu/databases/kddcup98/kddcup98.html
Reuters-21578 Text Categorization CollectionThis is a collection of documents that appeared on Reuters newswire in 1987. The documents were assembled and indexed with categories. 09/26/1997http://www.cs.purdue.edu/commugrate/data/reuters21578/reuters21578.tar.gzhttp://kdd.ics.uci.edu/databases/reuters21578/reuters21578.tar.gz
UNIX User DataThis file contains 9 sets of sanitized user data drawn from the command histories of 8 UNIX computer users at Purdue over the course of up to 2 years. http://www.cs.purdue.edu/commugrate/data/UNIX_user_data/UNIX_user_data.tar.gzhttp://kdd.ics.uci.edu/databases/UNIX_user_data/UNIX_user_data.tar.gz
Enron Email DatasetThis dataset was collected and prepared by the CALO Project (A Cognitive Assistant that Learns and Organizes). It contains data from about 150 users, mostly senior management of Enron, organized into folders. The corpus contains a total of about 0.5M messages. This data was originally made public, and posted to the web, by the Federal Energy Regulatory Commission during its investigation. The email dataset was later purchased by Leslie Kaelbling at MIT, and turned out to have a number of integrity problems. 08/21/2009http://www.cs.purdue.edu/commugrate/data/enron/enron_mail_082109.tar.gzhttp://www.cs.cmu.edu/~enron/enron_mail_082109.tar.gz
Internet AdvertisementsThis dataset represents a set of possible advertisements on Internet pages. 07/01/1998http://www.cs.purdue.edu/commugrate/data/internet_ads/http://archive.ics.uci.edu/ml/machine-learning-databases/internet_ads/
Microsoft Anonymous Web DataThis dataset records which areas (Vroots) of www.microsoft.com each user visited in a one-week timeframe in Feburary 1998. 11/30/1998http://www.cs.purdue.edu/commugrate/data/mswebhttp://kdd.ics.uci.edu/databases/msweb/msweb.html
Japanese VowelsThis dataset records 640 time series of 12 LPC cepstrum coefficients taken from nine male speakers. 06/13/2000http://www.cs.purdue.edu/commugrate/data/ipums/http://kdd.ics.uci.edu/databases/JapaneseVowels/JapaneseVowels.html
Pioneer-1 Mobile Robot DataThis dataset contains time series sensor readings of the Pioneer-1 mobile robot. The data is broken into "experiences" in which the robot takes action for some period of time and experiences a controlled interaction with its environment (i.e. bumping into a garbage can). 01/28/1999http://www.cs.purdue.edu/commugrate/data/pioneerhttp://kdd.ics.uci.edu/databases/pioneer/pioneer.html
TerroristsThis dataset contains information about terrorists and their relationships. Unlike the previous datasets, this dataset was designed for classification experiments aimed at classifying the relationships among terrorists. The dataset contains 851 relationships, each described by a 0/1-valued vector of attributes where each entry indicates the absence/presence of a feature. There are a total of 1224 distinct features. Each relationship can be assigned one or more labels out of a maximum of four labels making this dataset suitable for multi-label classification tasks. The README file provides more details.http://www.cs.purdue.edu/commugrate/data/terrorist/TerroristRel.tgzhttp://www.cs.umd.edu/~sen/lbc-proj/data/TerroristRel.tgz


This research is supported by NSF Grant Number IIS 0916614 and by Purdue Cyber Center.