Database Reference
In-Depth Information
Concatenate horizontally
Let's say you have three CSV files that you want to put side by side. We use tee
(Parker, Stallman, & MacKenzie, 2012) to save the result of csvcut in the middle of
the pipeline:
$ < tips.csv csvcut -c bill,tip | tee bills.csv | head -n 3 | csvlook
|--------+-------|
| bill | tip |
|--------+-------|
| 16.99 | 1.01 |
| 10.34 | 1.66 |
|--------+-------|
$ < tips.csv csvcut -c day,time | tee datetime.csv |
> head -n 3 | csvlook
|------+---------|
| day | time |
|------+---------|
| Sun | Dinner |
| Sun | Dinner |
|------+---------|
$ < tips.csv csvcut -c sex,smoker,size | tee customers.csv |
> head -n 3 | csvlook
|---------+--------+-------|
| sex | smoker | size |
|---------+--------+-------|
| Female | No | 2 |
| Male | No | 3 |
|---------+--------+-------|
Assuming that the rows line up, you can simply paste (Ihnat & MacKenzie, 2012) the
files together:
$ paste -d, { bills,customers,datetime } .csv | head -n 3 | csvlook
|--------+------+--------+--------+------+-----+---------|
| bill | tip | sex | smoker | size | day | time |
|--------+------+--------+--------+------+-----+---------|
| 16.99 | 1.01 | Female | No | 2 | Sun | Dinner |
| 10.34 | 1.66 | Male | No | 3 | Sun | Dinner |
|--------+------+--------+--------+------+-----+---------|
The -d option instructs paste to use a comma as the delimiter.
Joining
Sometimes data cannot simply be combined by vertical or horizontal concatenation.
In some cases, especially in relational databases, the data is spread over multiple
tables (or files) in order to minimize redundancy. Imagine we wanted to extend the
Iris data set with more information about the three types of Iris flowers, namely the
USDA identifier. It so happens that we have a separate CSV file with these identifiers:
Search WWH ::




Custom Search