Database Reference
In-Depth Information
the array, it uses Perl's notation that permits a list of subscripts to be specified all at once
to request multiple array elements. For example, if
@col_list
contains the values
2
,
6
,
and
3
, these two expressions are equivalent:
(
$val
[
2
] ,
$val
[
6
],
$val
[
3
])
@val
[
@col_list
]
What if you want to extract columns from a file that's not in tab-delimited format, or
produce output in another format? In that case, combine
yank_col.pl
with the
cvt_file.pl
script. Suppose that you want to pull out all but the password column from the colon-
delimited
/etc/passwd
file and write the result in CSV format. Use
cvt_file.pl
both to
preprocess
/etc/passwd
into tab-delimited format for
yank_col.pl
and to postprocess the
extracted columns into CSV format:
%
cvt_file.pl --idelim=":" /etc/passwd \
| yank_col.pl --columns=1,3-7 \
| cvt_file.pl --oformat=csv > passwd.csv
To avoid typing all of that as one long command, use temporary files for the intermediate
steps:
%
cvt_file.pl --idelim=":" /etc/passwd > tmp1.txt
%
yank_col.pl --columns=1,3-7 tmp1.txt > tmp2.txt
%
cvt_file.pl --oformat=csv tmp2.txt > passwd.csv
%
rm tmp1.txt tmp2.txt
Forcing split() to Return Every Field
The Perl
split()
function is extremely useful, but normally omits trailing empty fields.
This means that if you write only as many fields as
split()
returns, output lines may
not have the same number of fields as input lines. To avoid this problem, pass a third
argument to indicate the maximum number of fields to return. This forces
split()
to
return as many fields as are actually present on the line or the number requested,
whichever is smaller. If the value of the third argument is large enough, the practical
effect is to cause all fields to be returned, empty or not. Scripts shown in this chapter
use a field count value of 10,000:
# split line at tabs, preserving all fields
my
@val
=
split
(
/\t/
,
$_
,
10000
);
In the (unlikely?) event that an input line has more fields than that, it is truncated. If
you think that will be a problem, bump up the number even higher.