Database Reference
In-Depth Information
Luckily, there is a command-line tool called in2csv (Groskopf, 2014), which is able to
convert Microsoft Excel spreadsheets to CSV files. CSV stands for comma-separated
values or character-separated values. Working with CSV can be tricky because it lacks
a formal specification. RFC 4180 defines the CSV format according to the following
three points:
1. Each record is located on a separate line, delimited by a line break (CRLF). For
example:
aaa,bbb,ccc CRLF
zzz,yyy,xxx CRLF
2. The last record in the file may or may not have an ending line break. For exam‐
ple:
aaa,bbb,ccc CRLF
zzz,yyy,xxx
3. There may be an optional header line appearing as the first line of the file with
the same format as normal record lines. This header will contain names corre‐
sponding to the fields in the file and should contain the same number of fields as
the records in the rest of the file (the presence or absence of the header line
should be indicated via the optional header parameter of this MIME type). For
example:
field_name,field_name,field_name CRLF
aaa,bbb,ccc CRLF
zzz,yyy,xxx CRLF
Let's demonstrate in2csv using a spreadsheet that contains the top 250 movies from
the Internet Movie Database (IMDb). The file is named imdb-250.xlsx and can be
obtained from http://bit.ly/analyzing_top250_movies_list . To extract its data, we
invoke in2csv as follows:
$ cd ~/book/ch03
$ in2csv data/imdb-250.xlsx > data/imdb-250.csv
The format of the file is automatically determined by the extension, .xlsx in this case.
If we were to pipe the data into in2csv , we would have to specify the format explic‐
itly. Let's look at the data:
$ in2csv data/imdb-250.xlsx | head | cut -c1-80
Title,title trim,Year,Rank,Rank (desc),Rating,New in 2011 from 2010?,2010 rank,R
Sherlock Jr. (1924),SherlockJr.(1924),1924,221,30,8,y,n/a,n/a,
The Passion of Joan of Arc (1928),ThePassionofJoanofArc(1928),1928,212,39,8,y,n/
His Girl Friday (1940),HisGirlFriday(1940),1940,250,1,8,y,n/a,n/a,
Tokyo Story (1953),TokyoStory(1953),1953,248,3,8,y,n/a,n/a,
The Man Who Shot Liberty Valance (1962),TheManWhoShotLibertyValance(1962),1962,2
Persona (1966),Persona(1966),1966,200,51,8,y,n/a,n/a,
Stalker (1979),Stalker(1979),1979,243,8,8,y,n/a,n/a,
Search WWH ::




Custom Search