Information Technology Reference
In-Depth Information
After the staff imports a collection, they then choose a sequence that best represents the
family. Whenever possible, the staff chooses a representative that has a structure record
in MMDB.
3.3 The Position-specific Score Matrix (PSSM)
Once imported and constructed, each domain alignment in CDD is used to calculate a
model sequence, called a consensus sequence, for each CD. The consensus sequence lists
the most frequently found residue in each position in the alignment; however, for a
sequence position to be included in the consensus sequence, it must be present in at least
50% of the aligned sequences. Aligned columns covered by the consensus sequence are
then used to calculate a PSSM, which memorizes the degree to which particular residues
are conserved at each position in the sequence. Once calculated, the PSSM is stored with
the alignment and becomes part of the CDD. The RPS-BLAST tool locates CDs within a
query sequence by searching against this database of PSSMs.
3.4 Reverse Position-specific BLAST (RPS-BLAST
RPS-BLAST is a variant of the popular Position-specific Iterated BLAST (PSI-BLAST)
program. PSI-BLAST finds sequences similar to the query and uses the resulting
alignments to build a PSSM for the query. With this PSSM the database is scanned again
to draw in more hits and further refine the scoring model. RPS-BLAST uses a query
sequence to search a database of precalculated PSSMs and report significant hits in a
single pass. The role of the PSSM has changed from “query” to “subject”; hence, the
term “reverse” in RPS-BLAST. RPS-BLAST is the search tool used in the CD-Search
service.
3.5 The CD Summary
Analogous to the Structure Summary page, the CD Summary page displays the available
information about a given CD and offers various links for either viewing the CD
alignment or initiating further searches (Figure 4). The CD Summary page can be
retrieved by selecting the CD name on any page.
3.6 CD Records Curated at NCBI
In 2002, NCBI released the first group of curated CD records, a new and expanding set of
annotated protein multiple sequence alignments and corresponding structure alignments.
These new records have Accession numbers beginning with “cd” and have been added to
the default CD-Search database. Most curated CD records are based on existing family
descriptions from SMART and Pfam, but the alignments may have been revised
Search WWH ::




Custom Search