Replication and External Data - PostgreSQL: Up and Running

Database Reference

In-Depth Information

Querying a Flat File as Jagged Arrays

Often, flat-file data sources have a different number of columns in each line and contain

multiple header rows and footer rows. These kinds of files tend to be prevalent when

the flat files originated as spreadsheets. Our favorite flat-file FDW for handling these

unstructured flat files is file_textarray_fdw . This wrapper can handle any kind of

delimited flat file, even if the number of elements in each row is inconsistent. It brings

in each row as a text array ( text[] ).

Unfortunately, file_textarray_fdw is not part of the core PostgreSQL offering, so

you'll need to compile it yourself. First, install PostgreSQL with PostgreSQL develop‐

ment headers. Then download the file_textarray_fdw source code from the Adun‐

stan GitHub site . There is a different branch for each version of PostgreSQL, so make

sure to pick the right branch. Once you've compiled the code, install it as an extension,

as you would any other FDW.

If you are on Linux/Unix, it's an easy compile if you have the postgresql-dev package

installed. We did the work of compiling for Windows; you can download our binaries

from Windows-32 9.1 FDWs , Windows-32 9.2 FDWs , Windows-64 9.2 FDWs ,

Windows-32 9.3 FDWs , and Windows-64 9.3 FDWs .

The first step to perform after you have installed an FDW is to create an extension in

your database:

CREATE EXTENSION file_textarray_fdw ;

Then create a a foreign server as you would with any FDW:

CREATE SERVER file_taserver FOREIGN DATA WRAPPER file_textarray_fdw ;

Next, register the tables. You can place foreign tables in any schema you want. In

Example 10-2 , we use our staging schema again.

Example 10-2. Make a file text array foreign table from delimited file

CREATE FOREIGN TABLE staging . factfinder_array ( x text [])

SERVER file_taserver

OPTIONS ( format 'csv' , filename '/postgresql_book/ch10/

DEC_10_SF1_QTH1_with_ann.csv' ,

header 'false' , delimiter ',' , quote '"' , encoding 'latin1' , null ''

);

Our example CSV begins with eight header rows and has more columns than we care

to count. When the setup is finished, you can finally query our delimited file directly.

This following query will give us the names of the header rows where the first column

header is GEO.id :

SELECT unnest ( x ) FROM staging . factfinder_array WHERE x [ 1 ] = 'GEO.id'

This next query will give us the first two columns of our data:

Search WWH ::

Custom Search

Home