Datasets for Data Mining, Machine Learning and Exploration
Introduction
Reference datasets for tests, benchmarks, etc.
Datasets
Rdatasets is a collection of 758 datasets that were originally distributed
alongside the statistical software environment R and some of its add-on packages.
The goal is to make these data more broadly accessible for teaching and statistical
software development. Rdatasets.
[+] This page contains a list of datasets that were selected for the projects for Data Mining and Exploration.
Students can choose one of these datasets to work on, or can propose data of their own choice.
At the bottom of this page, you will find some examples of datasets which we judged as inappropriate for the projects.
Particle physics data set
Physiological data set
Brain-Computer Interface data set
Prediction of Gene/Protein Localization data set
Prediction of Molecliar Bioactivity for Drug Design: Binding to Thrombin dataset
This data set used in the CoIL 2000 Challenge contains information on customers of an insurance company.
The data consists of 86 variables and includes product usage data and socio-demographic data.
Insurance Company Benchmark (COIL 2000) Data Set
Finance and economic data in the form you want; instant download, API or direct to your app:
Quandl.
Quandl unifies over 20 million financial and economic datasets from over 500 publishers on a single user-friendly platform.
Datasets from the Deep learning website:
Datasets.
These datasets can be used for benchmarking deep learning algorithms:.
Several classic datasets have been used extensively in the statistical literature:
Classic datasets.