Database Reference
In-Depth Information
How to do it…
To deine a parser, we just deine functions that parse the different parts of the input and then
combine them to parse larger structures:
1.
It would be useful to have a way to parse two things and throw away the results of
the second. This function will do that:
(defn <| [l r]
(let [l-output (l)]
(r)
l-output))
2. Also, we'll deine a parser for the end of a line. It matches either a carriage return or
a new line:
(defn nl []
(chr-in #{\newline \return}))
3.
Let's start putting the pieces together. The irst function parses the sequence deinition
line by accepting a > character, followed by anything up to the end of the line:
(defn defline []
(chr \>)
(<| #(read-to-re #"[\n\r]+") nl))
4. We parse a sequence of amino acid or nucleic acid codes by deining a parser for
a single code and then building on that to create a parser for a line of code:
(defn acid-code []
(chr-in #{\A \B \C \D \E \F \G \H \I \K \L \M
\N \P \Q \R \S \T \U \V \W \X \Y \Z
\- \*}))
(defn acid-code-line []
(<| #(multi+ acid-code) #(attempt nl)))
5.
Next, we combine these parsers into one that parses an entire FASTA record and
populates a map with our data. Moreover, we deine a combinator that parses
multiple FASTA records:
(defn fasta []
(ws?)
(let [dl (defline)
gls (apply str (flatten
(multi+ acid-code-line)))]
{:defline dl, :gene-seq gls}))
(defn multi-fasta []
(<| #(multi+ fasta)
ws?))
 
Search WWH ::




Custom Search