Krugle Code Search Architecture - Finding Source Code on the Web for Remix and Reuse

Databases Reference

In-Depth Information

Finally, we tune the disks being used to avoid seek contention. There are two

drives devoted to snapshots, while one is serving up the current snapshot, the other

is being used to build the new snapshot. The Hub also uses two other drives for raw

data and processed data, again to allow multiple tasks to run in a multi-threaded

manner without running into disk thrashing.

The end result is an architecture that looks like this (Fig. 6.3 ):

Raw

Data

Files

Parsed

Data,

Indexes

External Data

Sources via

SCMIs

The Hub

Snapshot

2

Snapshot

1

The API

Fig. 6.3 Architecture of Krugle enterprise

6.5.4 Parsing Source Code

During early beta testing, we learned a lot about how developers search in code,

with two in particular being important. First, we needed to support semi-structured

searches, for example where the user wants to limit the search to only find hits in

class definition names.

In order to support this, we had to be able to parse the source code. But “parsing

the source code” is a rather vague description. There are lots of compilers out there

that obviously parse source code, but full compilation means that you need to know

about include paths (or classpaths), compiler-specific switches, the settings for the

macro preprocessor in C/C++, etc. The end result is that you effectively need to be

Search WWH ::

Custom Search

Home