Network Rail Data Feeds

April 8, 2020

Efficient train scheduling has long relied on operations research and optimization algorithms. However, the increasing complexity of multi-train networks has rendered traditional mathematical programming inadequate. Recent advances in artificial intelligence and machine learning, combined with open access to comprehensive railway operations data, offer promising new techniques.

Goal: Use the feeds to create training datasets that can be fed into downstream learning algorithms.

Network Rail provide a number of operational data feeds available to anyone - the only requirement is registering. Several schemas are provided:

BPLAN -> (2 months) long term train planning data
Corpus -> (1 month) location refence data
Movement -> (real time) position and movement events
RTPPM -> (1 minute) performance against timetable
Schedule -> (1 day) updates of train schedules
SMART -> (1 month) all positions
TD -> (real time) granular position events
TSR -> (1 week) temporary speed restrictions
VSTP -> (real time) cancellations / last minute changes
Infrastructure Model -> (1 day) network graph

An alternative system is Darwin provided by National Rail, this is the official train running information engine - and powers Departure boards & Google Maps.

A comparison of the two services exists within OpenRailData wiki.

Architecture

The feeds all use STOMP protocol, from my research, a fairly niche text based messaging protocol, although Apache ActiveMQ , RabbitMQ and others do support it.

A basic connection can be made using node with the snippet

var prettyjson = require('prettyjson'),
    StompClient = require('stomp-client').StompClient;

var destination = '/topic/TRAIN_MVT_ALL_TOC',
    client = new StompClient('datafeeds.networkrail.co.uk', 61618, 'your-email', 'your-password', '1.0');

client.connect(function(sessionId) {
    console.log('Trying to connect...');
    client.subscribe(destination, function(body, headers) {
        console.log(prettyjson.render(JSON.parse(body)));
    });
});

This listens for movements to any to TOC’s (Train Operating Companies) trains - so any trains in the UK.

Batch data is loaded nightly, into several PSQL tables, notable tables include:

Schedules -> timetables
Smart -> locations, and links to other locations - creating a graph of the entire network
Trains -> the main table, live trains

This information is set behind a simple REST api for downstream consumption.

http://nrdf-api.kmml.dev/live/schedules/nt  
http://nrdf-api.kmml.dev/live/service/W12345  
http://nrdf-api.kmml.dev/live/all/mco/to/wgn

Returns the next 25 trains that NT (Northern Rail) will run.
Returns any trains with headcode W12345 on the current day.
Returns the next 10 trains
that arrive/depart at MCO (Manchester Oxford Road) to WGN (Wigan North
Western).

Additionally, a web interface is provided http://nrdf.kmml.dev/ although it is now offline (its super expensive to process that amount of data, maybe it will be resurrected in the future)

The original goal was to harness the data provided to create training datasets to power downstream machine learning applications - however, with the size of dataset, without adventuring into HPC, creating a dataset of any meaning poses too large a challenge for my current knowledge.

This project was the first venture into ’traditional’ programming languages (MATLAB, laughs aside) and valuable lessons were learned:

Message Brokers
REST APIs
Batch Data Processing
Real Time Processing
DB Design - it was awful though, lots of antipatterns learned.
Frontend Design Hopefully this will be revisited in the future.

More details available in the paper.

-K

Network Rail Data Feeds

Architecture

Contents