With this database loaded and ready to go, let's figure out how (and why) to bring streams into the mix.
Let's start with a simple request: we would like to print out all the word usages, one per line, with all the
associated metadata. This might be done, for instance, if you are generating input for a Hadoop map-reduce
run. The SQL we are going to run (and some sample results) are as follows:
SELECT t.name, l."offset", w."value", lw."offset"
FROM "text" t, word w
INNER JOIN line l ON (t.id = l.text_id)
INNER JOIN line_word lw ON (lw.line_id = l.id AND lw.word_id = w.id)
A LOVER'S COMPLAINT
THE WINTER'S TALE
THE TWO GENTLEMEN OF VERONA
ALLS WELL THAT ENDS WELL
THE TRAGEDY OF ANTONY AND CLEOPATRA
How do we work with this query now that we can leverage lambdas? We can decompose the problem
into the steps in the flow given below. By using streams, each of those four steps can be developed
individually. The stream instance and its element type will bridge those independently developed steps.
Because of that, we can develop the steps in whichever order and put them together at the end. If we had
multiple developers, we could even assign the different steps to be built out by different developers, with
each step being tied to the previous. So it's not just the execution that is made more asynchronous and
concurrent by streams and lambdas: it's development, too. We will prove this out by starting in the middle,
working our way to the end, and then filling in the front and tying it all together.
to a Stream of
Print out the