Database Reference
In-Depth Information
que that splits up the whole data set into a certain number of subsets. These subsets
are called folds. (Usually, five or ten folds are used.)
We need to add an identifier to each row so that we can easily identify the data points
later (the predictions are not in the same order as the original data set):
$ mkdir train
$ wine-white-clean.csv nl -s, -w1 -v0 | sed '1s/0,/id,/' > train/features.csv
Running the Experiment
Create a configuration file called predict-quality.cfg :
[General]
experiment_name = Wine
task = cross_validate
[Input]
train_location = train
featuresets = [["features.csv"]]
learners = ["LinearRegression","GradientBoostingRegressor","RandomForestRegres
sor"]
label_col = quality
[Tuning]
grid_search = false
feature_scaling = both
objective = r2
[Output]
log = output
results = output
predictions = output
We run the experiment using the run_experiment command-line tool (Educational
Testing Service, 2014):
$ run_experiment -l predict-quality.cfg
The -l option specifies to run in local mode. SKLL also offers the possibility to run
experiments on clusters. The time it takes to run the experiment depends on the
complexity of the chosen algorithms.
Parsing the Results
Once all algorithms are done, the results can now be found in the directory output :
$ cd output
$ ls -1
Wine_features.csv_GradientBoostingRegressor.log
Wine_features.csv_GradientBoostingRegressor.predictions
Wine_features.csv_GradientBoostingRegressor.results
Search WWH ::




Custom Search