Database Reference
In-Depth Information
Massively Parallel Snippet Processing
Every S-blade's snippet processor receives the specific instructions that it
needs to execute its portion of the snippet. In addition to the host scheduler,
the snippet processors have their own smart pre-emptive scheduler that allows
snippets from multiple queries to execute simultaneously. This scheduler
takes into account the priority of the query and the resources set aside for the
user or group that issued the query to decide when and for how long to
schedule a particular snippet for execution. The following steps outline the
sequence of events associated with snippet processing:
1. The processor core on each snippet processor configures the FPGA
engines with parameters contained in the query snippet and sets up
a data stream.
2. The snippet processor reads table data from the disk array into
memory. It also interrogates the cache before accessing the disk for a
data block, avoiding a scan if the data is already in memory. The
snippet processor uses a Netezza innovation called ZoneMap
acceleration to reduce disk scans.
ZoneMap acceleration exploits the natural ordering of rows in a
data warehouse to accelerate performance by orders of magnitude.
The technique avoids scanning rows with column values outside the
start and end range of a query. For example, if a table contains two
years of weekly records (~100 weeks), and a query is looking for
data for only one week, ZoneMap acceleration can improve
performance up to 100 times. Unlike typical indexes associated with
optimizations in traditional data warehousing technology,
ZoneMaps are automatically created and updated for each database
table, without incurring any administrative overhead.
3. The FPGA acts on the data stream. First, it accelerates it four- to
eight-fold by decompressing data stream at the transmission speed
of the network. Next, its embedded engines filter out any data that's
not relevant to the query. The remaining data is streamed back to
memory for concurrent processing by the CPU core (as we show in
Figure 4-3). The resultant data at this stage is typically a tiny fraction
(2 to 5 percent) of the original stream, greatly reducing the execution
time that's required by the processor core.
Search WWH ::




Custom Search