Database Reference
In-Depth Information
• How should NULL or empty values be handled? Are they permitted? Can NULL values
even be detected? (Some systems export NULL values as empty strings, making it
impossible to distinguish them.)
• Do data values require validation or reformatting? If the values are in a format that
matches MySQL's expectations, no further processing is necessary. Otherwise, they
must be checked and possibly rewritten.
For export from MySQL, the issues are somewhat the reverse. You can assume that
values stored in the database are valid, but it's necessary to add column and record
delimiters to form an output stream that has a structure other programs can recognize,
and values may require reformatting for use by other programs.
File Formats
Datafiles come in many formats, two of which appear frequently in this chapter:
Tab-delimited or tab-separated values (TSV) format
This is one of the simplest file structures; lines contain values separated by tab
characters. A short tab-delimited file might look like this, where the whitespace
between column values represents single tab characters:
a b c
a,b,c d e f
Comma-separated values (CSV) format
Files written in CSV format vary somewhat; there is apparently no formal standard
describing the format. However, the general idea is that lines consist of values sep‐
arated by commas, and values containing internal commas are enclosed within
quotes to prevent the commas from being interpreted as value delimiters. It's also
common for values containing spaces to be quoted as well. In this example, each
line contains three values:
a,b,c
"a,b,c","d e",f
It's trickier to process CSV files than tab-delimited files because characters like
quotes and commas have a dual meaning: they may represent file structure or be
included in the content of data values.
Another important datafile characteristic is the line-ending sequence. The most com‐
mon sequences are carriage return, linefeed, and carriage return/linefeed pair, some‐
times referred to here by the abbreviations CR, LF, and CRLF.
Datafiles often begin with a row of column labels. For some import operations, the row
of labels must be discarded to avoid having it be loaded into your table as data. In other
cases, the labels are quite useful:
Search WWH ::




Custom Search