Database Reference
In-Depth Information
| border | surface |
|---------+----------|
| 3.2 | 0.44 |
| 4.4 | 2 |
| 39 | 61 |
| 76 | 160 |
| 10.2 | 34 |
| 120.3 | 468 |
| 1.2 | 6 |
| 10.2 | 54 |
| 359 | 2586 |
| 466 | 6220 |
|---------+----------|
That concludes the demonstration of converting HTML/XML to JSON to CSV.
Although jq can perform many more operations, and despite the specialized tools
available for working with XML data, in our experience, converting the data to CSV
format as quickly as possible tends to work well. This way you can spend more time
becoming proficient at generic command-line tools, rather than very specific tools.
Common Scrub Operations for CSV
Extracting and Reordering Columns
Columns can be extracted and reordered using the command-line tool csvcut (Gros‐
kopf, 2014). For example, to keep only the columns in the Iris data set that contain
numerical values and reorder the middle two columns:
$ < iris.csv csvcut -c sepal_length,petal_length,sepal_width,petal_width |
> head -n 5 | csvlook
|---------------+--------------+-------------+--------------|
| sepal_length | petal_length | sepal_width | petal_width |
|---------------+--------------+-------------+--------------|
| 5.1 | 1.4 | 3.5 | 0.2 |
| 4.9 | 1.4 | 3.0 | 0.2 |
| 4.7 | 1.3 | 3.2 | 0.2 |
| 4.6 | 1.5 | 3.1 | 0.2 |
|---------------+--------------+-------------+--------------|
Alternatively, we can also specify the columns we want to leave out with -C , which
stands for complement :
$ < iris.csv csvcut -C species | head -n 5 | csvlook
|---------------+-------------+--------------+--------------|
| sepal_length | sepal_width | petal_length | petal_width |
|---------------+-------------+--------------+--------------|
| 5.1 | 3.5 | 1.4 | 0.2 |
| 4.9 | 3.0 | 1.4 | 0.2 |
| 4.7 | 3.2 | 1.3 | 0.2 |
Search WWH ::




Custom Search