Database Reference
In-Depth Information
Discussion
cvt_file.pl (see Recipe 11.6 ) serves as a tool that converts entire files from one format to
another. Another common datafile operation is to manipulate columns. This is necesā€
sary, for example, when importing a file into a program that doesn't understand how to
extract or rearrange input columns for itself. To work around this problem, rearrange
the datafile instead.
Recall that this chapter began with a description of a scenario involving a 12-column
CSV file somedata.csv from which only columns 2, 11, 5, and 9 were needed. To convert
the file to tab-delimited format, do this:
% cvt_file.pl --iformat=csv somedata.csv > somedata.txt
But then what? If you just want to knock out a short script to extract those specific four
columns, that's fairly easy: write a loop that reads input lines and writes only the desired
columns, in the proper order. But that would be a special-purpose script, useful only
within a highly limited context. With just a little more effort, it's possible to write a more
general utility yank_col.pl that enables you to extract any set of columns. With such a
tool, you specify the column list on the command line like this:
% yank_col.pl --columns=2,11,5,9 somedata.txt > tmp.txt
Because the script doesn't use a hardcoded column list, it can be used to extract an
arbitrary set of columns in any order. Columns can be specified as a comma-separated
list of column numbers or column ranges. (For example, --columns=1,10,4-7 means
columns 1, 10, 4, 5, 6, and 7.) yank_col.pl looks like this:
#!/usr/bin/perl
# yank_col.pl: Extract columns from input.
# Example: yank_col.pl --columns=2,11,5,9 filename
# Assumes tab-delimited, linefeed-terminated input lines.
# ... process command - line options ( not shown ) ...
# ... to get column list into @col_list array ...
while ( <> ) # read input
{
chomp ;
my @val = split ( /\t/ , $_ , 10000 ); # split, preserving all fields
# extract desired columns, mapping undef to empty string (can
# occur if an index exceeds number of columns present in line)
@val = map { defined ( $_ ) ? $_ : "" } @val [ @col_list ];
print join ( "\t" , @val ) . "\n" ;
}
The input processing loop converts each line to an array of values, then pulls out from
the array the values corresponding to the requested columns. To avoid looping through
Search WWH ::




Custom Search