Importing and Exporting Data - MySQL

Database Reference

In-Depth Information

Discussion

cvt_file.pl (see Recipe 11.6 ) serves as a tool that converts entire files from one format to

another. Another common datafile operation is to manipulate columns. This is neces‐

sary, for example, when importing a file into a program that doesn't understand how to

extract or rearrange input columns for itself. To work around this problem, rearrange

the datafile instead.

Recall that this chapter began with a description of a scenario involving a 12-column

CSV file somedata.csv from which only columns 2, 11, 5, and 9 were needed. To convert

the file to tab-delimited format, do this:

% cvt_file.pl --iformat=csv somedata.csv > somedata.txt

But then what? If you just want to knock out a short script to extract those specific four

columns, that's fairly easy: write a loop that reads input lines and writes only the desired

columns, in the proper order. But that would be a special-purpose script, useful only

within a highly limited context. With just a little more effort, it's possible to write a more

general utility yank_col.pl that enables you to extract any set of columns. With such a

tool, you specify the column list on the command line like this:

% yank_col.pl --columns=2,11,5,9 somedata.txt > tmp.txt

Because the script doesn't use a hardcoded column list, it can be used to extract an

arbitrary set of columns in any order. Columns can be specified as a comma-separated

list of column numbers or column ranges. (For example, --columns=1,10,4-7 means

columns 1, 10, 4, 5, 6, and 7.) yank_col.pl looks like this:

#!/usr/bin/perl

# yank_col.pl: Extract columns from input.

# Example: yank_col.pl --columns=2,11,5,9 filename

# Assumes tab-delimited, linefeed-terminated input lines.

# ... process command - line options ( not shown ) ...

# ... to get column list into @col_list array ...

while ( <> ) # read input

{

chomp ;

my @val = split ( /\t/ , $_ , 10000 ); # split, preserving all fields

# extract desired columns, mapping undef to empty string (can

# occur if an index exceeds number of columns present in line)

@val = map { defined ( $_ ) ? $_ : "" } @val [ @col_list ];

print join ( "\t" , @val ) . "\n" ;

}

The input processing loop converts each line to an array of values, then pulls out from

the array the values corresponding to the requested columns. To avoid looping through

MySQL

Search WWH ::

Custom Search

Home