Java Reference
In-Depth Information
text
+ id : int primary key auto_increment
+ name : varchar unique
+ year : int
line
1-to-many
+ id : int primary key auto_increment
+ text_id : int
+ offset : int
word
1-to-many
+ id : int primary key auto_increment
+ value : varchar unique
line_word
1-to-many
+ id : int primary key auto_increment
+ line_id : int
+ word_id : int
+ offset : int
Figure A-1. The Database Layout
Most of the text breaks look like the sample given in Figure A-2 : it is the date of publication, the title, and
then “by William Shakespeare.” However, the sonnets are numbered, and so they follow a slightly different
format. This format is shown in Figure A-3 . “THE SONNETS” is the title, and when we see that title, we need
to start looking for lines that are only numbers - those lines denote the start of a new sonnet. So parsing texts
is slightly different for the Sonnets than for everything else, which adds a wrinkle of complexity.
1603
ALLS WELL THAT ENDS WELL
by William Shakespeare
Dramatis Personae
KING OF FRANCE
THE DUKE OF FLORENCE
BERTRAM, Count of Rousillon
LAFEU, an old lord
Figure A-2. A Sample Text Start
 
Search WWH ::




Custom Search