Database Reference
In-Depth Information
SELECT INTO OUTFILE statement). For situations where the native facilities do not
suffice, we move on to cover techniques for using external supporting utilities (such as
sed and tr ) and for writing your own. There are two broad sets of issues to consider:
• How to manipulate the structure of datafiles. When a file is in a format not suitable
for import, you must convert it to a different format. This may involve issues such
as changing the column delimiters or line-ending sequences, or removing or rear‐
ranging columns in the file. This chapter covers such techniques.
• How to manipulate the content of datafiles. If you don't know whether the values
contained in a file are legal, you may want to preprocess it to check or reformat
them. Numeric values may need verification as lying within a specific range, dates
may need conversion to or from ISO format, and so forth. Chapter 12 covers those
techniques.
Source code for program fragments and scripts discussed in this chapter is located in
the transfer directory of the recipes distribution.
General Import and Export Issues
Incompatible datafile formats and differing rules for interpreting various kinds of values
cause headaches when transferring data between programs. Nevertheless, certain issues
recur frequently. Be aware of them and you can identify more easily what must be done
to solve particular import or export problems.
In its most basic form, an input stream is just a set of bytes with no particular meaning.
Successful import into MySQL requires recognizing which bytes represent structural
information and which represent the data values framed by that structure. Because such
recognition is key to decomposing the input into appropriate units, the most funda‐
mental import issues are these:
• What is the record separator? Knowing this enables you to partition the input
stream into records.
• What is the field delimiter? Knowing this enables you to partition each record into
field values. Identifying the data values also might include stripping quotes from
around the values or recognizing escape sequences within them.
The ability to break the input into records and fields is important for extracting the data
values from it. If the values are still not in a form that can be used directly, you may need
to consider other issues:
• Do the order and number of columns match the structure of the database table?
Mismatches require rearranging or skipping columns.
Search WWH ::




Custom Search