Database Reference
In-Depth Information
$ < names-comma.csv awk -F, 'BEGIN{OFS=","; print "id,full_name,born"}' \
> '{if(NR > 1) {print $1,$3" "$2,$4}}' | tail -n 1
6, van" "Beethoven,Ludwig
$ < names-comma.csv cols -c first_name,last_name tr \" , \" \" \" |
> header -r full_name,id,born | csvcut -c id,full_name,born | tail -n 1
6,"Ludwig ""Beethoven van""",1770
$ < names-comma.csv csvsql --query "SELECT id, first_name || ' ' || last_name" \
> " AS full_name, born FROM stdin" | tail -n 1
6,"Ludwig Beethoven, van",1770
$ < names-comma.csv Rio -e 'df$full_name <- paste(df$first_name,df$last_name);' \
> 'df[c("id","full_name","born")]' | tail -n 1
6,"Ludwig Beethoven, van",1770
Wait a minute! What's that last command? Is that R? Well, as a matter of fact, it is. It's
R code evaluated through a command-line tool called Rio (Janssens, 2014). All that
we can say at this moment is that this approach also succeeds at merging the two col‐
umns. We'll discuss this nifty command-line tool later.
Combining Multiple CSV Files
Concatenate vertically
Vertical concatenation may be necessary in cases where you have, for example, a data
set that is generated on a daily basis, or where each data set represents a different
market or product. Let's simulate the latter by splitting up our beloved Iris data set
into three CSV files, so that we have something to combine again. We'll use fields
plit (Hinds et al., 2010), which is part of the CRUSH suite of command-line tools:
$ < iris.csv fieldsplit -d, -k -F species -p . -s .csv
Here, the options specify: the delimiter ( -d ), that we want to keep the header in each
file ( -k ), the column whose values dictate the possible output files ( -F ), the relative
output path ( -p ), and the filename suffix ( -s ), respectively. Because the species col‐
umn in the Iris data set contains three different values, we end up with three CSV
files, each with 50 lines and a header:
$ wc -l Iris-*.csv
51 Iris-setosa.csv
51 Iris-versicolor.csv
51 Iris-virginica.csv
153 total
You could just concatenate the files back using cat and removing the headers of all
but the first file using header -d as follows:
$ cat Iris-setosa.csv < ( < Iris-versicolor.csv header -d ) \
> < ( < Iris-virginica.csv header -d ) | sed -n '1p;49,54p' | csvlook
|---------------+-------------+--------------+-------------+------------------|
Search WWH ::




Custom Search