Database Reference
In-Depth Information
Wine_features.csv_GradientBoostingRegressor.results.json
Wine_features.csv_LinearRegression.log
Wine_features.csv_LinearRegression.predictions
Wine_features.csv_LinearRegression.results
Wine_features.csv_LinearRegression.results.json
Wine_features.csv_RandomForestRegressor.log
Wine_features.csv_RandomForestRegressor.predictions
Wine_features.csv_RandomForestRegressor.results
Wine_features.csv_RandomForestRegressor.results.json
Wine_summary.tsv
SKLL generates four files for each learner: one log, two with results, and one with
predictions. Moreover, SKLL generates a summary file, which contains a lot of infor‐
mation about each individual fold (too much to show here). We can extract the rele‐
vant metrics using the following SQL query:
$ < Wine_summary.tsv csvsql --query "SELECT learner_name, pearson FROM stdin " \
> "WHERE fold = 'average' ORDER BY pearson DESC" | csvlook
|----------------------------+----------------|
| learner_name | pearson |
|----------------------------+----------------|
| RandomForestRegressor | 0.741860521533 |
| GradientBoostingRegressor | 0.661957860603 |
| LinearRegression | 0.524144785555 |
|----------------------------+----------------|
The relevant column here is pearson , which indicates the Pearson's ranking correla‐
tion. This is a value between -1 and 1 that indicates the correlation between the true
ranking (of quality scores) and the predicted ranking. Let's paste all the predictions
back to the data set:
$ parallel "csvjoin -c id train/features.csv <(< output/Wine_features.csv_{}" \
> ".predictions | tr '\t' ',') | csvcut -c id,quality,prediction > {}" ::: \
> RandomForestRegressor GradientBoostingRegressor LinearRegression
$ csvstack *Regres* -n learner --filenames > predictions.csv
And create a plot using Rio (see Figure 9-8 ):
$ < predictions.csv Rio -ge 'g+geom_point(aes(quality, round(prediction), ' \
> 'color=learner), position="jitter", alpha=0.1) + facet_wrap(~ learner) + ' \
> 'theme(aspect.ratio=1) + xlim(3,9) + ylim(3,9) + guides(colour=FALSE) + ' \
> 'geom_smooth(aes(quality, prediction), method="lm", color="black") + ' \
> 'ylab("prediction")' | display
Search WWH ::




Custom Search