Database Reference
In-Depth Information
3000000:
820
Row count: 41799
Note that csvstat , just like csvsql , employs heuristics to determine the data type,
and therefore may not always get it right. We encourage you to always do a manual
inspection as discussed in the previous subsection. Moreover, the type may be a char‐
acter string or integer, but that doesn't say anything about how it should be treated.
As a nice extra, csvstat outputs, at the very end, the number of data points. Newlines
and commas inside values are handled correctly. To only see the relevant line of the
output, we can use tail :
$ csvstat data/iris.csv | tail -n 1
Row count: 150
If you only want to see the actual number of data points, you can use, for example, the
following sed expression to extract the number:
$ csvstat data/iris.csv | sed -rne '${s/^([^:]+): ([0-9]+)$/\2/;p}'
150
Using R from the Command Line with Rio
In this section, we'd like to introduce you to a command-line tool called Rio , which is
essentially a small, nifty wrapper around the statistical programming environment R.
Before we explain what Rio does and why it exists, lets talk a bit about R itself.
R is a very powerful statistical software package to analyze data and create visualiza‐
tions. It's an interpreted programming language, has an extensive collection of pack‐
ages, and offers its own REPL, which allows you, similar to the command line, to play
with your data. Unfortunately, R is quite separated from the command line. Once you
start it, you're in a separate environment. R doesn't really play well with the command
line because you cannot pipe any data into it and it also doesn't support any one-
liners that you can specify.
For example, imagine that you have a CSV file called data/tips.csv , and you would like
to compute the tip percentage and save the result. To accomplish this in R, you would
first start up R:
$ R
And then run the following commands:
> tips <- read.csv ( 'data/tips.csv' , header = T , sep = ',' , stringsAsFactors = F )
> tips.percent <- tips $ tip / tips $ bill * 100
> cat ( tips.percent , sep = '\n' , file = 'data/percent.csv' )
> q ( "no" )
Search WWH ::




Custom Search