Databases Reference
In-Depth Information
Table 6.2: Excerpt from Solr index schema for SCM data
6.5.3 Performance
For an enterprise search appliance, a basic issue is how to do two things well at
the same time—updating a live index, and handling search requests. Both tasks can
require extensive CPU, disk and memory resources, so it's easy to wind up with
resource contention issues that kill your performance.
We made three decisions that helped us avoid the above. First, we pushed a signif-
icant amount of work “off the box” by putting a lot of the heavy lifting work into the
hands of small clients called Source Code Management Interfaces (SCMIs). These
run on external customer servers instead of on our appliance, and act as collectors for
information about projects, SCM comments, source code and other development-
oriented information. The information is then partially digested before being sent
back to the appliance via a typical HTTP RESTful protocol.
Second, we use separate JVMs for the data processing/indexing tasks versus the
searching/browsing tasks. This let us better control memory usage, at the cost of
some wasted memory. The Hub data processing JVM receives data from the SCMI
clients, manages the workflow for parsing/indexing/analyzing the results, and builds
a new “snapshot.” This snapshot is a combination of multiple Lucene indexes, plus
all of the content and other analysis results. When a new snapshot is ready, a “flip”
request is sent to the API JVM that handles the search side of things, and this new
snapshot is gracefully swapped in.
On a typical appliance, we have two 32-bit JVMs running, each with 1.5 GB of
memory. One other advantage to this approach is that we can shut down and restart
each JVM separately, which makes it easier to do live upgrades and debug problems.
Search WWH ::




Custom Search