Database Reference
In-Depth Information
The line-ending sequence used in a datafile typically is determined by the system from
which the file originated. Unix files normally have lines terminated by linefeeds, which
you indicate like this:
LINES TERMINATED BY '\n'
Because
\n
happens to be the default line terminator, you need not specify that clause
in this case unless you want to indicate the line-ending sequence explicitly. If files on
your system don't use the Unix default (linefeed), you must specify the line terminator
explicitly. For files that have lines ending in carriage returns or carriage return/linefeed
pairs, respectively, use the appropriate
LINES
TERMINATED
BY
clause:
LINES TERMINATED BY '\r'
LINES TERMINATED BY '\r\n'
For example, to load a Windows file that contains tab-delimited fields and lines ending
with CRLF pairs, use this
LOAD
DATA
statement:
mysql>
LOAD DATA LOCAL INFILE 'mytbl.txt' INTO TABLE mytbl
->
LINES TERMINATED BY '\r\n';
The corresponding
mysqlimport
command is:
%
mysqlimport --local --lines-terminated-by="\r\n" cookbook mytbl.txt
If the file has been transferred from one machine to another, its contents may have been
changed in subtle ways of which you're not aware. For example, an FTP transfer between
machines running different operating systems typically translates line endings to those
that are appropriate for the destination machine if the transfer is performed in text mode
rather than in binary (image) mode.
When in doubt, check the contents of your datafile using a hex dump program or other
utility that displays a visible representation of whitespace characters like tab, carriage
return, and linefeed. Under Unix, programs such as
od
or
hexdump
can display file
contents in a variety of formats. If you don't have these or some comparable utility, the
transfer
directory of the
recipes
distribution contains hex dumpers written in Perl,
Ruby, and Python (
hexdump.pl
,
hexdump.rb
, and
hexdump.py
), as well as programs that
display printable representations of all characters of a file (
see.pl
,
see.rb
, and
see.py
). You
may find them useful for examining files to see what they really contain.
Dealing with quotes and special characters
If your datafile contains quoted values or escaped characters, tell
LOAD
DATA
to be aware
of them so that it doesn't load uninterpreted data values into the database.
The
FIELDS
clause can specify other format options besides
TERMINATED
BY
. By default,
LOAD
DATA
assumes that values are unquoted, and it interprets the backslash (
\
) as an
escape character for special characters. To indicate the value-quoting character explic‐