An Efficient Update Mechanism for GPU-Based IP Lookup Engine Using Threaded Segment Tree - Frontiers in Internet Technologies

Information Technology Reference

In-Depth Information

Algorithm 3. Insert Segment

Input : curNode,insSeg

if curNode.seg = insSeg then

1

curNode.seg.value = DEFAULT ; curNode.isPrefixSeg = FALSE ;

2

LeafPushing ( curNode,nextHop );

3

end

4

else

5

if curNode.isLeaf then

6

BreakSegment ( curNode );

7

end

8

mid =( curNode.seg.low + curNode.seg.high ) / 2;

9

if mid < insSeg.low then

10

InsertSegment ( curNode.rightChild, insSeg );

11

end

12

if mid ≥ insSeg.high then

13

InsertSegment ( curNode.leftChild, insSeg );

14

end

15

end

16

448 Cores) on the basis of CUDA 5.0. We measure all concerned metrics through

the NVIDIA Visual Profiler [17].

Tabl e 1. Routing Data Sets (Collected at Jan. 1, 2013)

Data Set Location

Total Prefixes Total Updates

rrc11

New York (NY), USA 442,176

1,177,425

rrc12

Frankfurt, Germany 450,752

4,049,260

rrc13

Moscow, Russia

456,580

2,025,239

rrc14

Palo Alto, USA

446,160

1,388,217

4.1 Memory Accesses Required for On-line Updates

To evaluate the performance of route update, we replay a week's update traces

of rrc 12 and a whole day's update traces for all tables. After processing off-line

updates in TST, we get a list of threaded segments, each of which represents

several memory writes toward a range of units of the TBL24 on the GPU. While

for GALE, due to the length map checking, it requires both memory reads and

writes on the GPU to finish on-line updates. We evaluate the performance of

on-line updates by measuring the average produced global memory accesses per

update on the GPU.

As depicted in Fig. 5(a), the average produced memory accesses in TSTT are

far less than that in GALE in all cases. In fact, TSTT achieves a reduction by

92 . 5%

∼

98 . 1%, which is still 82 . 5%

∼

97 . 3% even ignoring memory reads in

Frontiers in Internet Technologies

Search WWH ::

Custom Search

Home