Database Reference
In-Depth Information
sample_row = ('Michael', 1234, 23.46, 'San Francisco, California')
# Write a CSV row
csv_writer.writerow(sample_row)
# Result: Michael,1234,23.46,"San Francisco, California"
CSV is definitely a great format for “f lat” data—that is, data that can be repre-
sented in a single line. Log data, such as that coming from Web servers and sensors, is
well represented in this format. CSV can be fairly compact as text-based data goes: It's
the data, pure and simple, with little markup or structure to get in the way. Also, it
is definitely easy for most people to use CSV, as it can be imported into spreadsheets,
ingested into databases, and easily parsed programmatically. For logs or records that
don't require data modeling beyond f lat rows, CSV can be extremely useful.
Most importantly, CSV is an excellent format for
sequential access
of data. In other
words, it's simple for a computer program to grab one, two, or 1,000 rows at a time
from the middle of a file and just start processing. In a distributed processing system,
this is helpful for breaking up large programming tasks into many smaller ones. Do
you have a huge CSV file that is overwhelming the memory of your single machine?
Just split it up and process the fragments.
Although CSV has many positives going for it, there are cases in which it can be
a pretty bad format for sharing a large amount data. First of all, it lacks much in the
way of standardization. Certainly there have been attempts at official CSV standards,
3
but in practice, there is little regularity in how developers create CSV output. Unfor-
tunately, this sometimes means that people will add a few header lines, use peculiar
delimiters between fields, or escape strings in eccentric ways. CSV also doesn't provide
a standard way of referring to information about the file itself; when working with
collections of CSVs, any information about the type or date that the data represents is
sometimes found in the filename itself. In fact, CSV files basically lack any metadata
at all, requiring those using them for sharing data to provide additional information
about the file somewhere else.
CSV is very bad at describing data that does not fit well into discreet rows. In prac-
tice, real world data often has many dimensions, and not all of these dimensions neces-
sarily fit into the rigid structure of CSV's rectangular regularity. Take, for example,
data about the number of people registered for political parties in the United States,
organized state by state. All states have representatives from the two major parties and
many of the much smaller ones. However, some states may have specific parties not
found in other states. Therefore, the list of parties will be different sizes for different
states. Expressing this data in CSV format provides a data-modeling challenge. Does
one create columns of data for every possible political party? Is the list of parties con-
catenated into a single string and stored in a field all by itself ? These representations
are not a natural fit for a fixed-size-row structure.
Search WWH ::
Custom Search