Macromolecular Structure Databases - Essays in Bioinformatics

Information Technology Reference

In-Depth Information

After the staff imports a collection, they then choose a sequence that best represents the

family. Whenever possible, the staff chooses a representative that has a structure record

in MMDB.

3.3 The Position-specific Score Matrix (PSSM)

Once imported and constructed, each domain alignment in CDD is used to calculate a

model sequence, called a consensus sequence, for each CD. The consensus sequence lists

the most frequently found residue in each position in the alignment; however, for a

sequence position to be included in the consensus sequence, it must be present in at least

50% of the aligned sequences. Aligned columns covered by the consensus sequence are

then used to calculate a PSSM, which memorizes the degree to which particular residues

are conserved at each position in the sequence. Once calculated, the PSSM is stored with

the alignment and becomes part of the CDD. The RPS-BLAST tool locates CDs within a

query sequence by searching against this database of PSSMs.

3.4 Reverse Position-specific BLAST (RPS-BLAST

RPS-BLAST is a variant of the popular Position-specific Iterated BLAST (PSI-BLAST)

program. PSI-BLAST finds sequences similar to the query and uses the resulting

alignments to build a PSSM for the query. With this PSSM the database is scanned again

to draw in more hits and further refine the scoring model. RPS-BLAST uses a query

sequence to search a database of precalculated PSSMs and report significant hits in a

single pass. The role of the PSSM has changed from “query” to “subject”; hence, the

term “reverse” in RPS-BLAST. RPS-BLAST is the search tool used in the CD-Search

service.

3.5 The CD Summary

Analogous to the Structure Summary page, the CD Summary page displays the available

information about a given CD and offers various links for either viewing the CD

alignment or initiating further searches (Figure 4). The CD Summary page can be

retrieved by selecting the CD name on any page.

3.6 CD Records Curated at NCBI

In 2002, NCBI released the first group of curated CD records, a new and expanding set of

annotated protein multiple sequence alignments and corresponding structure alignments.

These new records have Accession numbers beginning with “cd” and have been added to

the default CD-Search database. Most curated CD records are based on existing family

descriptions from SMART and Pfam, but the alignments may have been revised

Search WWH ::

Custom Search

Home