An Efficient Update Mechanism for GPU-Based IP Lookup Engine Using Threaded Segment Tree - Frontiers in Internet Technologies

Information Technology Reference

In-Depth Information

off-line updates on the CPU, achieving more ecient on-line updates toward the

TBL24 on the GPU. We call the proposed IP lookup engine TSTT, for its two

most important components are TST and TBL24 respectively.

We make three key contributions in this paper. Firstly, after transforming the

FIB into a compact segment tree, we design a special leaf-pushing technique, to

divide all prefixes into several non-intersecting segments. Besides, we present a

series of algorithms to thread necessary segments during off-line updates, ensur-

ing threaded segments cover all update information without any intersecting. As

a result, on-line updates can be processed completely in parallel.

The rest of this paper is organized as follows. Section 2 presents the system

architecture of our proposed scheme. And the details of TST are proposed in

Sect. 3. Section 4 discusses the performance evaluation experiments. At last, a

short conclusion are given in Sect. 5.

2 System Architecture

Since there are many novel designs toward the entire framework of GPU-based

software routers [8, 9], we only focus on the IP lookup engine, which can be

deployed into such routers as an additional plug-in [11]. With the help of an

optimized packet I/O engine [8], which manages packet queues and extracts

pending packets' destination addresses for lookup, we pay our major attention

to the GPU-based accelerator for table lookup and update.

As shown in Fig. 2, our system architecture is based on Compute Unified

Device Architecture (CUDA), in which, all program codes are divided into two

cooperative parts, the Host and the Device , executed respectively on the CPU

and the GPU. In the Device , the TBL24, which stores all prefixes no longer

than 24 (the rest are stored in a small TCAM, which is not shown in Fig.2),

provides O(1) lookup. In the Host , as managed by a control thread, a group of

working threads process lookup and update requests, by utilizing the computing

resources of both the CPU and the GPU.

Processing route updates on TBL24 always needs help from additional struc-

tures [11]. In TSTT, TST plays such a role. Actually, in our case, all prefixes

no longer than 24 are transformed into segments, each of which corresponds to

a range of units of the TBL24. Then, a TST is constructed on bidis of these

segments, which is stored in the host memory (on the CPU) for off-line updates.

When a route update arrives, it is first used to update this TST, producing sev-

eral unit modifications toward the TBL24. During such off-line updates on the

CPU, all produced unit modifications are collected, and are then processed on

the GPU to update the TBL24 for on-line updates.

As the GPU works in Single Instructions with Multiple Threads (SIMT),

batch processing is required for improving performance. As long as all unit mod-

ifications produced during off-line updates are kept independently with each

other, all of them can be processed in parallel on the GPU. In this case, we can

Search WWH ::

Custom Search

Home