Database Reference
In-Depth Information
• Imports required packages
• Loads the CSV file as a data.frame
• Generates a ggplot2 object if needed (more on this in the next section)
• Runs the specified commands
• Prints the result of the last command to standard output
So now, if you wanted to do one or two things to your data set with R, you can specify
it as a one-liner, and keep on working on the command line. All the knowledge that
you already have about R can now be leveraged from the command line. With Rio ,
you can even create sophisticated visualizations, as you'll see later in this chapter.
Rio doesn't have to be used as a filter, meaning the output doesn't have to be in CSV
format per se. You can compute various descriptive statistics:
$ < data/iris.csv Rio -e 'mean(df$sepal_length)'
5.843333
$ < data/iris.csv Rio -e 'sd(df$sepal_length)'
0.8280661
$ < data/iris.csv Rio -e 'sum(df$sepal_length)'
876.5
And if we wanted to compute the five summary statistics, we would do:
$ < data/iris.csv Rio -e 'summary(df$sepal_length)'
Min. 1st Qu. Median Mean 3rd Qu. Max.
4.300 5.100 5.800 5.843 6.400 7.900
You can also compute the skewness (symmetry of the distribution) and kurtosis
(peakedness of the distribution), but then you need to have the moments package
installed:
$ < data/iris.csv Rio -e 'skewness(df$sepal_length)'
$ < data/iris.csv Rio -e 'kurtosis(df$petal_width)'
Correlation between two features:
$ < dat/iris.csv Rio -e 'cor(df$bill, df$tip)'
0.6757341
Or even a correlation matrix:
$ < data/tips.csv csvcut -c bill,tip | Rio -f cor | csvlook
|--------------------+--------------------|
| bill | tip |
|--------------------+--------------------|
| 1 | 0.675734109211365 |
| 0.675734109211365 | 1 |
|--------------------+--------------------|
Search WWH ::




Custom Search