Databases Reference
In-Depth Information
Enterprise system and the public site would have been a top-level “master” that
handles distributing search requests and combining the results.
We would still have needed a back-end system to handle crawling web pages and
handling search requests, but that's a much more isolated problem, and one with
better existing support from Nutch.
6.5 Krugle Enterprise
The architecture described above worked well for handling billions of lines of code,
but it wasn't suitable for a stand-alone enterprise product that could run reliably
without daily care and feeding. In addition, we didn't have the commit comment
data from the SCM systems that hosted the project source code, which was a highly
valuable source of information for both searches and analytics.
So we created a workflow system (internally called “the Hub”) that handled the
crawling and processing of data, and converted the original multi-server search sys-
tem into a single-server solution (“the API”).
The enterprise version doesn't support crawling web pages, and it relies on users
manually defining projects—specifying the repository type and location, the de-
scription of the project, etc. This information is still stored in a MySQL database.
6.5.1 SCM Comments
We added support for fetching, parsing and searching SCM comments that we re-
trieved from SCM systems. These comments were stored in the same Solr search
server used for project search, but in a different Solr “core.”
We created a Solr index schema that had the following fields, among others
(Table 6.2 ):
6.5.2 SCMI Architecture
Early on we realized that it would be impossible to install and run all of the many
different types of source code management system (SCM) clients on the enterprise
server. For example, a ClearCase SCM requires a matching client, which in turn has
to be custom installed.
Our solution was to define a standard protocol between the Hub and “helper” ap-
plications that could run on other servers. This SCM interface (SCMI) let us quickly
build connectors to many different SCM systems, including ClearCase, Perforce,
StarTeam, and git, as well as non-SCM sources of information such as Jira and
Bugzilla.
Search WWH ::




Custom Search