Efficient train scheduling has long relied on operations research and optimization algorithms. However, the increasing complexity of multi-train networks has rendered traditional mathematical programming inadequate. Recent advances in artificial intelligence and machine learning, combined with open access to comprehensive railway operations data, offer promising new techniques.
Goal: Use the feeds to create training datasets that can be fed into downstream learning algorithms.
Network Rail provide a number of operational data feeds available to anyone - the only requirement is registering. Several schemas are provided:
- BPLAN -> (2 months) long term train planning data
- Corpus -> (1 month) location refence data
- Movement -> (real time) position and movement events
- RTPPM -> (1 minute) performance against timetable
- Schedule -> (1 day) updates of train schedules
- SMART -> (1 month) all positions
- TD -> (real time) granular position events
- TSR -> (1 week) temporary speed restrictions
- VSTP -> (real time) cancellations / last minute changes
- Infrastructure Model -> (1 day) network graph
An alternative system is Darwin provided by National Rail, this is the official train running information engine - and powers Departure boards & Google Maps.
A comparison of the two services exists within OpenRailData wiki.
Architecture
The feeds all use STOMP protocol, from my research, a fairly niche text based messaging protocol, although Apache ActiveMQ , RabbitMQ and others do support it.
A basic connection can be made using node with the snippet
var prettyjson = require('prettyjson'),
StompClient = require('stomp-client').StompClient;
var destination = '/topic/TRAIN_MVT_ALL_TOC',
client = new StompClient('datafeeds.networkrail.co.uk', 61618, 'your-email', 'your-password', '1.0');
client.connect(function(sessionId) {
console.log('Trying to connect...');
client.subscribe(destination, function(body, headers) {
console.log(prettyjson.render(JSON.parse(body)));
});
});
This listens for movements to any to TOC’s (Train Operating Companies) trains - so any trains in the UK.
Batch data is loaded nightly, into several PSQL tables, notable tables include:
- Schedules -> timetables
- Smart -> locations, and links to other locations - creating a graph of the entire network
- Trains -> the main table, live trains
This information is set behind a simple REST api for downstream consumption.
http://nrdf-api.kmml.dev/live/schedules/nt http://nrdf-api.kmml.dev/live/service/W12345 http://nrdf-api.kmml.dev/live/all/mco/to/wgn
- Returns the next 25 trains that NT (Northern Rail) will run.
- Returns any trains with headcode W12345 on the current day.
- Returns the next 10 trains
that arrive/depart at MCO (Manchester Oxford Road) to WGN (Wigan North
Western).
Additionally, a web interface is provided http://nrdf.kmml.dev/ although it is now offline (its super expensive to process that amount of data, maybe it will be resurrected in the future)
The original goal was to harness the data provided to create training datasets to power downstream machine learning applications - however, with the size of dataset, without adventuring into HPC, creating a dataset of any meaning poses too large a challenge for my current knowledge.
This project was the first venture into ’traditional’ programming languages (MATLAB, laughs aside) and valuable lessons were learned:
- Message Brokers
- REST APIs
- Batch Data Processing
- Real Time Processing
- DB Design - it was awful though, lots of antipatterns learned.
- Frontend Design Hopefully this will be revisited in the future.
More details available in the paper.
-K