Graphics Reference
In-Depth Information
10.3 KEEL-Dataset
In this section we present the KEEL-dataset repository. It can be accessed through
the main KEEL webpage. 16 The KEEL-dataset repository is devoted to the data sets
in KEEL format which can be used with the software and provides:
￿
A detailed categorization of the considered data sets and a description of their
characteristics. Tables for the data sets in each category have been also created.
￿
A descriptions of the papers which have used the partitions of data sets avail-
able in the KEEL-dataset repository. These descriptions include results tables, the
algorithms used and additional material.
KEEL-dataset contains two main sections according to the previous two points. In
the first part, the data sets of the repository are presented. They have been organized
in several categories and sub-categories arranging them in tables. Each data set has
a dedicated webpage in which its characteristics are presented. These webpages also
provide the complete data set and the partitions ready to download.
On the other hand, the experimental studies section is a novel approach in these
types of repositories. It provides a series of webpages for each experimental study
with the data sets used and their results in different formats as well, ready to perform
a direct comparison. Direct access to the paper's PDF for all the experimental studies
included in this webpage is also provided.
In Fig. 10.5 the main webpage, in which these two main sections appear, is
depicted.
In the rest of this section we will describe the two main sections of the KEEL-
dataset repository webpage.
10.3.1 Data Sets Web Pages
The categories of the data sets have been derived from the topics addressed in the
experimental studies. Some of themare usually found in the literature, like supervised
(classification) data sets, unsupervised and regression problems. On the other hand,
new categories which have not been tackled or separated yet are also present. The
categories in which the data sets are divided are the following:
￿
Classification problems. This category includes all the supervised data sets. All
these data sets contains one or more attributes which label the instances, mapping
them into different classes. We distinguish three subcategories of classification
data sets:
- Standard data sets .
- Imbalanced data sets [ 29 - 31 ]. Imbalanced data sets are standard classification
data sets where the class distribution is highly skewed among the classes.
16 http://keel.es/datasets.php .
 
Search WWH ::




Custom Search