Database Reference
In-Depth Information
Table 23-2. Mandatory fields in the SAM format
Col Field
Type
Regexp/Range
Brief description
1
Query template NAME
QNAME String [!-?A-~]{1,255}
[0, 2 16 -1]
2
bitwise FLAG
FLAG
Int
3
Reference sequence NAME
RNAME String \*|[!-()+-<>-~][!-~]*
[0,2 31 -1]
4
1-based leftmost mapping POSition
POS
Int
[0,2 8 -1]
5
MAPping Quality
MAPQ
Int
6
CIGAR String \*|([0-9]+[MIDNSHPX=])+ CIGAR string
7
RNEXT String \*|=|[!-()+-><-~][!-~]* Ref. name of the mate/NEXT read
[0,2 31 -1]
8
Position of the mate/NEXT read
PNEXT Int
[-2 31 +1,2 31 -1]
9
observed Template LENgth
TLEN
Int
10
segment SEQuence
SEQ
String \*|[A-Za-z=.]+
11
ASCII of Phred-scaled base QUALity+33
QUAL
String [!-~]
Any developers who want to implement this specification need to translate this English
spec into their computer language of choice. In ADAM, we have chosen instead to use lit-
erate programming with a spec defined in Avro IDL. For example, the mandatory fields
for SAM can be easily expressed in a simple Avro record:
record AlignmentRecord {
string qname;
int flag;
string rname;
int pos;
int mapq;
string cigar;
string rnext;
int pnext;
int tlen;
string seq;
string qual;
}
Avro is able to autogenerate native Java (or C++, Python, etc.) classes for reading and
writing data and provides standard interfaces (e.g., Hadoop's InputFormat ) to make
integration with numerous systems easy. Avro is also designed to make schema evolution
easier. In fact, the ADAM schemas we use today have evolved to be more sophisticated,
Search WWH ::




Custom Search