Hardware Reference
In-Depth Information
know the timetables, so the web sites of most interest are those that report the live information, including late trains
and cancellations.
In England, the foremost site is Live Departure Boards ( www.nationalrail.co.uk/times_fares/ldb/ ),
which provides reasonably accurate information about most trains on the U.K. network. It doesn't include an API,
unfortunately, but it is very easy to scrape for the current train times and also comes with a Twitter feed detailing the
station closures and the overrunning of engineering works. It also has the advantage of using a basic GET request to
retrieve the times of all the trains between two named stations, making it easier to bookmark. One journey I make on
occasion is between St. Pancras and Luton Airport. On reviewing the site, I can see that the URL incorporates both of
these locations in the form:
http://ojp.nationalrail.co.uk/service/timesandfares/STP/LTN/today/0630/dep
So, this could be scraped in the same way as we saw earlier. However, a study of the site's source code reveals that
there's an AJAX request that populates the page at:
http://ojp.nationalrail.co.uk/en/s/ldb/liveTrainsJson
This page can be controlled by amending the parameters ( ?liveTrainsFrom=STP&liveTrainsTo=LTN ) and can be
incorporated into code 4 for whattrain.php like this:
$url = " http://ojp.nationalrail.co.uk/en/s/ldb/liveTrainsJson ?\
departing=true&liveTrainsFrom=$fromCode&liveTrainsTo=$toCode&serviceId=";
$contents = getContents($url, "ldb_${fromCode}_${toCode}", 5*60);
As a simple request, this uses only GET, and so can accessed simply through cURL, or directly in the browser,
which makes for simple and direct testing. You can also save the output into a temporary, local, file for offline testing.
This resultant output from the URL provides a JSON object containing the next few trains to depart. No historical data
from before midnight today is available, however.
It is also worth noting that the getContents method here involves a temporary file, which avoids downloading the
data again if a subsequent query is made within five minutes of the previous one. This may need changing according
to your needs. From here, we only need to decode the JSON:
$trainTimes = json_decode($contents);
$trains = $trainTimes->{'trains'};
which in turn allows you to read the information for each train:
foreach($trains as $entry) {
$expectedTime = $entry[1];
$destination = $entry[2];
$status = $entry[3];
$platform = $entry[4];
$arrivalTime = $expectedTime;
if (preg_match('/((\d+):0?(\d+))/', $status, $matches)) {
$expectedTime = $matches[0];
}
4HISISSCREENSCRAPING0ERLCODEWHICHMAYHAVEBROKENBYTHETIMEYOUREADTHIS4HERELATIVEPITFALLSANDCONSIDERATIONSOFTHIS
APPROACHWERECOVEREDEARLIERINTHECHAPTER
Search WWH ::




Custom Search