Java Reference
In-Depth Information
<pitching team_flag=
"home"
out=
"27"
h=
"9"
r=
"4"
er=
"4"
bb=
"1"
so=
"4"
hr=
"2"
bf=
"34"
era=
"6.91"
>
<pitcher id=
"346871"
name=
"Cook"
pos=
"P"
out=
"18"
bf=
"23"
er=
"3"
r=
"3"
h=
"6"
so=
"2"
hr=
"1"
bb=
"0"
w=
"0"
l=
"2"
era=
"4.50"
note=
"(L, 0-2)"
/>
...
</pitching>
<batting team_flag=
"away"
ab=
"33"
r=
"4"
h=
"9"
d=
"2"
t=
"0"
hr=
"2"
rbi=
"4"
bb=
"1"
po=
"27"
da=
"8"
so=
"4"
avg=
".322"
lob=
"10"
>
<batter id=
"453056"
name=
"Ellsbury"
pos=
"CF-LF"
bo=
"100"
ab=
"4"
po=
"3"
r=
"1"
bb=
"0"
a=
"0"
t=
"0"
sf=
"0"
h=
"2"
e=
"0"
d=
"1"
so=
"1"
hr=
"0"
rbi=
"0"
lob=
"2"
fldg=
"1.000"
avg=
".450"
/>
<batter id=
"456030"
name=
"Pedroia"
pos=
"2B"
bo=
"200"
ab=
"4"
po=
"1"
r=
"0"
bb=
"0"
a=
"4"
t=
"0"
sf=
"0"
h=
"0"
e=
"0"
d=
"0"
hbp=
"0"
so=
"0"
hr=
"0"
rbi=
"0"
lob=
"2"
fldg=
"1.000"
avg=
".227"
/>
...
</batting>
...
</boxscore>
The root element is
<boxscore>
, which has several attributes. It has a child element
called
<linescore>
, which shows the scoring in each inning. Then there are
<pitch-
ing>
and
<batting>
elements for the home team and away team, respectively.
This isn't a terribly complex XML file, but if you had to process it using Java the code
would quickly get involved. Using Groovy, as shown previously, all you have to do is walk
the tree.
Parsing this data uses the same approach as parsing the geocoded data in the last section.
Here I need to assemble the URL based on the month, day, and year and then parse the box
score file:
def
url = base + "year_${year}/
month_
${month}/
day_
${day}/"
def
game = "gid_${year}_${month}_${day}_${away}mlb_${home}mlb_${num}/
boxscore
.xml"
def
boxscore =
new
XmlSlurper().parse("$url$game")
After parsing the file I can walk the tree to extract the team names and scores:
def
awayName = boxscore.@away_fname
def
awayScore = boxscore.linescore[0].@away_team_runs
def
homeName = boxscore.@home_fname
def
homeScore = boxscore.linescore[0].@home_team_runs