Database Reference
In-Depth Information
Querying a Flat File as Jagged Arrays
Often, flat-file data sources have a different number of columns in each line and contain
multiple header rows and footer rows. These kinds of files tend to be prevalent when
the flat files originated as spreadsheets. Our favorite flat-file FDW for handling these
unstructured flat files is file_textarray_fdw . This wrapper can handle any kind of
delimited flat file, even if the number of elements in each row is inconsistent. It brings
in each row as a text array ( text[] ).
Unfortunately, file_textarray_fdw is not part of the core PostgreSQL offering, so
you'll need to compile it yourself. First, install PostgreSQL with PostgreSQL develop‐
ment headers. Then download the file_textarray_fdw source code from the Adun‐
stan GitHub site . There is a different branch for each version of PostgreSQL, so make
sure to pick the right branch. Once you've compiled the code, install it as an extension,
as you would any other FDW.
If you are on Linux/Unix, it's an easy compile if you have the postgresql-dev package
installed. We did the work of compiling for Windows; you can download our binaries
from Windows-32 9.1 FDWs , Windows-32 9.2 FDWs , Windows-64 9.2 FDWs ,
Windows-32 9.3 FDWs , and Windows-64 9.3 FDWs .
The first step to perform after you have installed an FDW is to create an extension in
your database:
CREATE EXTENSION file_textarray_fdw ;
Then create a a foreign server as you would with any FDW:
CREATE SERVER file_taserver FOREIGN DATA WRAPPER file_textarray_fdw ;
Next, register the tables. You can place foreign tables in any schema you want. In
Example 10-2 , we use our staging schema again.
Example 10-2. Make a file text array foreign table from delimited file
CREATE FOREIGN TABLE staging . factfinder_array ( x text [])
SERVER file_taserver
OPTIONS ( format 'csv' , filename '/postgresql_book/ch10/
DEC_10_SF1_QTH1_with_ann.csv' ,
header 'false' , delimiter ',' , quote '"' , encoding 'latin1' , null ''
);
Our example CSV begins with eight header rows and has more columns than we care
to count. When the setup is finished, you can finally query our delimited file directly.
This following query will give us the names of the header rows where the first column
header is GEO.id :
SELECT unnest ( x ) FROM staging . factfinder_array WHERE x [ 1 ] = 'GEO.id'
This next query will give us the first two columns of our data:
Search WWH ::




Custom Search