Information Technology Reference
In-Depth Information
off-line updates on the CPU, achieving more ecient on-line updates toward the
TBL24 on the GPU. We call the proposed IP lookup engine TSTT, for its two
most important components are TST and TBL24 respectively.
We make three key contributions in this paper. Firstly, after transforming the
FIB into a compact segment tree, we design a special leaf-pushing technique, to
divide all prefixes into several non-intersecting segments. Besides, we present a
series of algorithms to thread necessary segments during off-line updates, ensur-
ing threaded segments cover all update information without any intersecting. As
a result, on-line updates can be processed completely in parallel.
The rest of this paper is organized as follows. Section 2 presents the system
architecture of our proposed scheme. And the details of TST are proposed in
Sect. 3. Section 4 discusses the performance evaluation experiments. At last, a
short conclusion are given in Sect. 5.
2 System Architecture
Since there are many novel designs toward the entire framework of GPU-based
software routers [8, 9], we only focus on the IP lookup engine, which can be
deployed into such routers as an additional plug-in [11]. With the help of an
optimized packet I/O engine [8], which manages packet queues and extracts
pending packets' destination addresses for lookup, we pay our major attention
to the GPU-based accelerator for table lookup and update.
As shown in Fig. 2, our system architecture is based on Compute Unified
Device Architecture (CUDA), in which, all program codes are divided into two
cooperative parts, the Host and the Device , executed respectively on the CPU
and the GPU. In the Device , the TBL24, which stores all prefixes no longer
than 24 (the rest are stored in a small TCAM, which is not shown in Fig.2),
provides O(1) lookup. In the Host , as managed by a control thread, a group of
working threads process lookup and update requests, by utilizing the computing
resources of both the CPU and the GPU.
Processing route updates on TBL24 always needs help from additional struc-
tures [11]. In TSTT, TST plays such a role. Actually, in our case, all prefixes
no longer than 24 are transformed into segments, each of which corresponds to
a range of units of the TBL24. Then, a TST is constructed on bidis of these
segments, which is stored in the host memory (on the CPU) for off-line updates.
When a route update arrives, it is first used to update this TST, producing sev-
eral unit modifications toward the TBL24. During such off-line updates on the
CPU, all produced unit modifications are collected, and are then processed on
the GPU to update the TBL24 for on-line updates.
As the GPU works in Single Instructions with Multiple Threads (SIMT),
batch processing is required for improving performance. As long as all unit mod-
ifications produced during off-line updates are kept independently with each
other, all of them can be processed in parallel on the GPU. In this case, we can
 
Search WWH ::




Custom Search