Biology Reference
In-Depth Information
sequence was broken into random fragments. In order to generate the
whole sequence, the individual reads (one from each well of a tray or
plate) must be reassembled. A program called “Phrap” is used to do this.
Phrap uses the base calls and the quality scores from Phred to determine
the most likely assembly by searching for overlapping segments (it takes
into account the fact that in regions where quality scores are low, over-
laps could be due to random errors). Both Phred and Phrap are open
source software packages developed in the early 1990s by Phil Green
and his colleagues at Washington University, St. Louis. 6
Despite the sophistication of these algorithms, they are usually un-
able to match up every sequence—for almost every genome, gaps re-
main. It is the job of the “fi nishing” team to take over where Phrap
leaves off and patch up the gaps. There are about forty fi nishers at the
Broad; their work is highly specialized and requires detailed knowl-
edge of the biology of the organisms being sequenced, the intricacies
of the sequencing process, and the assembly software. By querying the
LIMS, a fi nisher can fi nd samples in the queue that require fi nishing
work. During my i eldwork at the Broad, I spent several hours watch-
ing a highly experienced fi nisher at work. After retrieving the relevant
sequences from the LIMS database, the fi nisher imported them into a
graphical user tool that shows the overlapping sequence regions one
above the other. Scrolling rapidly along the sequence on the screen,
the fi nisher made quick decisions about the appropriate choice of base
at each position where there seemed to be a discrepancy. His experi-
ence allowed him to tell “just by eyeballing” where Phrap had made a
mistake—by placing TT in place of T, for instance. The graphical tool
also allowed the fi nisher to pull up the raw sequence traces—which
was sometimes necessary in order to make a decision about which base
call was the correct one. Where gaps existed, the fi nisher imported se-
quence data from sources that had not been used in the assembly (an
online database, for example) in order to begin to fi ll the hole. This
work is often painstaking and relies crucially on the fi nisher's judgment.
The following quotation describes the fi nisher's reasoning process as he
changed a single G to a C in the sequence:
Here now is a discrepancy in the consensus [sequence]. Usually
it's where we have a whole string of Gs like we have here; these
reads here are fairly far away from where the insert is—I can
tell just by their orientation—they've been running through the
capillary [the key component of the detector] for a while when
it hits one of these stretches . . . so it's basically calling an extra
Search WWH ::




Custom Search