Scrubbing Data - Data Science at the Command Line

Database Reference

In-Depth Information

| 4.6 | 3.1 | 1.5 | 0.2 |

|---------------+-------------+--------------+--------------|

Here, the included columns are kept in the same order. Instead of the column names,

you can also specify the indices of the columns, which start at 1. This allows you to,

for example, select only the odd columns (should you ever need it!):

$ echo 'a,b,c,d,e,f,g,h,i\n1,2,3,4,5,6,7,8,9' |

> csvcut -c $( seq 1 2 9 | paste -sd, )

a,c,e,g,i

1,3,5,7,9

If you're certain that there are no commas in any of the values, then you can also use

cut to extract columns. Be aware that cut does not reorder columns, as is demon‐

strated with the following command:

$ echo 'a,b,c,d,e,f,g,h,i\n1,2,3,4,5,6,7,8,9' | cut -d, -f 5,1,3

a,c,e

1,3,5

As you can see, it doesn't matter in which order we specify the columns; with cut

they will always appear in the original order. For completeness, let's also take a look at

the SQL approach for extracting and reordering the numerical columns of the Iris

data set:

$ < iris.csv csvsql --query "SELECT sepal_length, petal_length, " \

> "sepal_width, petal_width FROM stdin" | head -n 5 | csvlook

|---------------+--------------+-------------+--------------|

|---------------+--------------+-------------+--------------|

| 5.1 | 1.4 | 3.5 | 0.2 |

| 4.9 | 1.4 | 3.0 | 0.2 |

| 4.7 | 1.3 | 3.2 | 0.2 |

| 4.6 | 1.5 | 3.1 | 0.2 |

|---------------+--------------+-------------+--------------|

Filtering Lines

The difference between filtering lines in a CSV file as opposed to a plain-text file is

that you may want to base this filtering on values in a certain column only. Filtering

on location is essentially the same, but you have to take into account that the first line

of a CSV file is usually the header. Remember that you can always use the body

command-line tool if you want to keep the header:

$ seq 5 | sed -n '3,5p'

3

4

5

$ seq 5 | header -a count | body sed -n '3,5p'

count

3

Search WWH ::

Custom Search

Home