Database Reference
In-Depth Information
Overview
In this chapter, you'll learn how to:
• Inspect the data and its properties
• Compute descriptive statistics
• Create data visualizations inside and outside the command line
Inspecting Data and Its Properties
In this section, we'll demonstrate how to inspect a data set and its properties. Because
the upcoming visualization and modeling techniques expect the data to be in tabular
form, we'll assume that the data is in CSV format. You can use the techniques
described in Chapter 5 to convert your data to CSV if necessary.
For simplicity's sake, we'll also assume that your data has a header. In the first subsec‐
tion, we are going to determine whether that is the case. Once we know we have the
data in place, we can continue answering the following questions:
• How many data points and features does the data set have?
• What does the raw data look like?
• What kind of features does the data set have?
• Can some of these features be treated as categorical or as factors?
Header or Not, Here I Come
You can check whether your file has a header by printing the first few lines:
$ head file.csv | csvlook
It's then up to you to decide whether the first line is indeed a header or already the
first data point. When the data set contains no header or when its header contains
newlines, you're best off going back and correcting that by scrubbing the date (refer
to Chapter 5 for information on how to do that).
Inspect All the Data
If you want to inspect the raw data, then it's best not to use the cat command-line
tool, as cat prints all the data to the screen in one go. In order to inspect the raw data
at your own pace, we recommend using less (Nudelman, 2013) with the -S option:
$ less -S file.csv
Search WWH ::




Custom Search