Databases Reference
In-Depth Information
First, it allows projects from the same source to be grouped together, which makes
adding or removing contents more straightforward. Second, some sort of branching
turned out to be required, not to overburden the file system with tens of thousands
of subdirectories in a single directory. The files collected from open source projects
are stored in a folder according to the following template:
<repo_root>/<batch>/<id>
Above, <repo_root> is a folder assigned as the root of Sourcerer's file repos-
itory. Given the root folder, the individual project files are stored in a two-level
directory structure defined by the path fragment <batch>/<id> . <batch> is a top-
level folder in the directory structure that indicates a given batch. For example, a
crawl from a specific online repository or a collection of fixed number of projects
can denote a batch. Inside <batch> , another set of folders exists. Each second-level
folder in the local repository, indicated by <id> in the above template, contains the
contents of a specific project. Each <id> directory contains a single file and two
sub-directories, as shown below:
<repo_root>/<batch>/<id>/project.properties
<repo_root>/<batch>/<id>/download/
<repo_root>/<batch>/<id>/content/
Above, project.properties is a text file that stores the project metadata as
a list of name value pairs. download is a folder that contains the compressed file
packages that were fetched from the originating repository (e.g., a project's distri-
bution in Sourceforge). content contains the expanded contents of the download
directory. Once the contents of the download directory have been expanded, the
directory itself is usually emptied in order to free up space.
The project contents in the content directory can take two different forms, de-
pending on its format in the initial repository. If the project contents are checked
out from a remote software configuration management (SCM) system such as svn
and cvs, the file located at a relative path path in the originating repository (e.g.,
Sourceforge) exists in Sourcerer's file repository at the following absolute path:
<repo_root>/<batch>/<id>/content/<path>
Instead, if the project is fetched from a package distribution, a source file can be
found in Sourcerer's file repository at the following absolute path:
<repo_root>/<batch>/<id>/content/package.<i>/<path>
Above, package.<i> indicates a unique folder for each i th package that is found
in a remote repository. path indicates a relative path of a source code file that is
found inside the i th archived package, which is unarchived inside the package.<i>
folder.
Search WWH ::




Custom Search