Database Reference
In-Depth Information
Querying a Flat File as Jagged Arrays
Often, flat-file data sources have a different number of columns in each line and contain
multiple header rows and footer rows. These kinds of files tend to be prevalent when
the flat files originated as spreadsheets. Our favorite flat-file FDW for handling these
unstructured flat files is
file_textarray_fdw
. This wrapper can handle any kind of
delimited flat file, even if the number of elements in each row is inconsistent. It brings
in each row as a text array (
text[]
).
Unfortunately,
file_textarray_fdw
is not part of the core PostgreSQL offering, so
you'll need to compile it yourself. First, install PostgreSQL with PostgreSQL develop‐
stan GitHub site
. There is a different branch for each version of PostgreSQL, so make
sure to pick the right branch. Once you've compiled the code, install it as an extension,
as you would any other FDW.
If you are on Linux/Unix, it's an easy compile if you have the
postgresql-dev
package
installed. We did the work of compiling for Windows; you can download our binaries
The first step to perform after you have installed an FDW is to create an extension in
your database:
CREATE
EXTENSION
file_textarray_fdw
;
Then create a a foreign server as you would with any FDW:
CREATE
SERVER
file_taserver
FOREIGN
DATA
WRAPPER
file_textarray_fdw
;
Next, register the tables. You can place foreign tables in any schema you want. In
Example 10-2
, we use our staging schema again.
Example 10-2. Make a file text array foreign table from delimited file
CREATE
FOREIGN
TABLE
staging
.
factfinder_array
(
x
text
[])
SERVER
file_taserver
OPTIONS
(
format
'csv'
,
filename
'/postgresql_book/ch10/
DEC_10_SF1_QTH1_with_ann.csv'
,
header
'false'
,
delimiter
','
,
quote
'"'
,
encoding
'latin1'
,
null
''
);
Our example CSV begins with eight header rows and has more columns than we care
to count. When the setup is finished, you can finally query our delimited file directly.
This following query will give us the names of the header rows where the first column
header is
GEO.id
:
SELECT
unnest
(
x
)
FROM
staging
.
factfinder_array
WHERE
x
[
1
]
=
'GEO.id'
This next query will give us the first two columns of our data: