Database Reference
In-Depth Information
The -S option ensures that long lines are not being wrapped when they don't fit in the
terminal. Instead, less allows you to scroll horizontally to see the rest of the lines.
The advantage of less is that it does not load the entire file into memory, which is
good for viewing large files. Once you're in less , you can scroll down a full screen by
pressing <Space> . Scrolling horizontally is done by pressing <Left> and <Right> .
Press g and G to go to start and the end of the file, respectively. Quitting less is done
by pressing q . Read the man page for more key bindings.
If you want the data set to be nicely formatted, you can add csvlook to the pipeline:
$ < file.csv csvlook | less -S
Unfortunately, csvlook needs to read the entire file into memory
in order to determine the width of the columns. So, when you
want to inspect a very large file, you may want to get a subset
(using sample , for instance) or you may need to be patient.
Feature Names and Data Types
In order to gain insight into the data set, it is useful to print the feature names and
study them. After all, the feature names may indicate the meaning of the feature. You
can use the following sed expression for this:
$ cd ~/book/ch07
$ < data/iris.csv sed -e 's/,/\n/g;q'
sepal_length
sepal_width
petal_length
petal_width
species
Note that this basic command assumes that the file is delimited by commas. Just as a
reminder, if you intend to use this command often, you could define a function in
your ~/.bashrc file called, say, names :
names () { sed -e 's/,/\n/g;q' ; }
Which you can then use like this:
$ < data/investments2.csv names
company_permalink
company_name
company_category_list
company_market
company_country_code
company_state_code
company_region
company_city
investor_permalink
Search WWH ::




Custom Search