Database Reference
In-Depth Information
7.4;0.7;0;1.9;0.076;11;34;0.9978;3.51;0.56;9.4;5
7.8;0.88;0;2.6;0.098;25;67;0.9968;3.2;0.68;9.8;5
7.8;0.76;0.04;2.3;0.092;15;54;0.997;3.26;0.65;9.8;5
11.2;0.28;0.56;1.9;0.075;17;60;0.998;3.16;0.58;9.8;6
==> wine-white.csv <==
"fixed acidity";"volatile acidity";"citric acid";"residual sugar";"chlorides";"f
ree sulfur dioxide";"total sulfur dioxide";"density";"pH";"sulphates";"alcohol";
"quality"
7;0.27;0.36;20.7;0.045;45;170;1.001;3;0.45;8.8;6
6.3;0.3;0.34;1.6;0.049;14;132;0.994;3.3;0.49;9.5;6
8.1;0.28;0.4;6.9;0.05;30;97;0.9951;3.26;0.44;10.1;6
7.2;0.23;0.32;8.5;0.058;47;186;0.9956;3.19;0.4;9.9;6
$ wc -l wine- { red,white } .csv
1600 wine-red.csv
4899 wine-white.csv
6499 total
At first sight this data appears to be very clean already. Still, let's scrub this data a little
bit so that it conforms more with what most command-line tools are expecting.
Specifically, we'll:
• Convert the header to lowercase
• Convert the semicolons to commas
• Convert spaces to underscores
• Remove unnecessary quotes
These things can all be taken care of by tr . Let's use a for loop this time—for old
times' sake—to process both data sets:
$ for T in red white; do
> < wine- $T .csv tr '[A-Z]; ' '[a-z],_' | tr -d \" > wine- ${ T } -clean.csv
> done
Let's combine the two data sets. We'll use csvstack to add a column named type
which will be red for rows of the first file, and white for rows of the second file:
$ HEADER = "$(head -n 1 wine-red-clean.csv),type"
$ csvstack -g red,white -n type wine- { red,white } -clean.csv |
> csvcut -c $HEADER > wine-both-clean.csv
The new column type is added to the beginning of the table. Because some of the
command-line tools that we'll use in this chapter assume that the class label is the last
column, we'll rearrange the columns by using csvcut . Instead of typing all 13 col‐
umns, we temporarily store the desired header into a variable HEADER before we call
csvstack .
It's good to check whether there are any missing values in this data set:
Search WWH ::




Custom Search