Biology Reference
In-Depth Information
Sera-mag oligo(dT) beads,
500 nt) and reverse-
transcribed using random hexamers. Short-insert libraries were con-
structed for paired-end RNA sequencing using the Illumina Hi-seq
platform. Using this approach, we generated 15
fragmented (300
e
25 million paired-end
reads from each of the libraries representing each tissue, stage, and sex.
Again, low-quality reads and potential contaminants were removed, and
adaptors trimmed. Following this process, all reads were pooled into one
large dataset and assembled using the program Oases. 22
Because there is currently no consensus approach for the prediction of
genes in eukaryotes, we used an integrated strategy, involving de novo- ,
homology-, and evidence-based methods. De novo gene prediction was
performed on a repeat-masked genome using three programs (Augustus,
GlimmerHMM and SNAP) 16 ; training models were generated from
a subset of the transcriptomic dataset representing 1355 distinct genes. The
homology-based prediction was conducted by comparison with com-
plete genomic data for C. elegans , 23 Pristionchus pacificus , 24 and Brugia
malayi . 25 Evidence-based gene prediction was conducted by aligning all
RNA-Seq data generated during the study 11 against the assembled
genome using TopHat, 26 with cDNAs predicted from the resultant data
using Cufflinks. 27 Following the prediction of genes, a non-redundant gene
set representing homology-based, de novo -predicted and RNA-Seq-
supported genes, was generated using Glean ( http://sourceforge.net/
projects/glean-gene ). All Glean-predicted genes were retained, as were
all genes supported by RNA-Seq data and those predicted using two or
more de novo methods (i.e. Augustus, GlimmerHMM and/or SNAP). The
protein coding sequences encoded by these predicted genes were then
inferred using BestORF ( www.softberry.com ) .
Once a consensus gene set had been predicted from the A. suum
genome, the next major step was to annotate the sequences and predict
their function(s). Although extensive functional data have been amassed
for model organisms, such as C. elegans , through gene knockout, knock-
down, and protein localization studies (see www.wormbase.org ) , such
data are not available for most parasites because of the complexities and
challenges in conducting experimental studies of most metazoan para-
sites in vitro . Although gene silencing appears to be possible in
A. suum , 28,29 this research is in its infancy for this parasite. Therefore, we
inferred the function of the A. suum predicted gene set using homology-
based comparisons with a wide range of datasets. These comparisons
included assessing the predicted peptides for conserved protein domains
classified in several databases (e.g. SProt, Pfam, and ProDom) using the
program InterProScan 30 and, on the basis of these data, classifying each
transcript according to functional hierarchies using the gene ontology
(GO) database, 31 providing additional information on the location of
activity within the cell (“cellular component”), basic molecular role/s
e
Search WWH ::




Custom Search