02.11.17
How data keeps London moving faster
Lauren Sager Weinstein, chief data officer at TfL, explains why the organisation is storing anonymised passenger data in order to improve services across the capital.
Every day millions of people travel on London’s transport network, making journeys by bus, the Tube and roads. The by-product of this hive of activity is a vast amount of system data. As TfL’s chief data officer, it’s my job to harness the data that we collect and turn it into useful tools that we can use to run our services better and provide information back to our customers.
We have a strong legacy of using data to help Londoners move around the city, even before the advent of modern computing processing power. We started with data that was collected by hand through surveys and by manual observations of where our vehicles were, for example.
But once we introduced systems that automatically collected information about our network – such as the SCOOT system that runs our traffic signals, and our Oyster ticketing system – we started to use data-driven analytics to better plan and run our transport network.
We’ve used data from our ticketing system to measure how many people are entering and exiting our stations, and we can show how this varies by time of day. We share this information with our customers so that they understand when our busy times are on the network and can thus plan their journeys.
Now, although our ticketing data is a powerful tool, it can’t answer all of our questions about travel patterns on the Underground network. Ticketing data captures gate-to-gate movements as customers touch to enter or exit a station. So we can see volumes of customers travelling from station A to station B, but we can’t work out the interchange paths that they took. This limits our ability to measure how crowded particular areas are in stations and on trains.
To try and answer this question, we carried out a four-week pilot where we gathered wi-fi connection requests from devices on our network and used data science techniques to create movement patterns. More than 500 million depersonalised connection data requests were collected in order to see if we could spot patterns of how groups of customers move through our stations and interchange between Tube lines. We also wanted to see if we could use this data to observe more precisely how and where crowding happens. We collected data from 54 stations during November and December 2016.
Keeping data private
In doing this, it was crucial that we protected our customers’ privacy and explained our pilot to them. We relied on very helpful guidance from the Information Commissioner’s Office (ICO) on how to ensure that our design took into account best practice to safeguard customers’ privacy.
We notified our customers about the pilot and explained to them how they could opt out of collection if they wished. We depersonalised the information collected and we did not identify any individuals. We did not gather any browsing data.
And while we want to share the findings of our pilot and information about volumes and footfalls, we would never sell the underlying personal data to third parties.
Astounding results
The results of the pilot were astounding and revealed a number of things that weren’t previously known about how people travel on the Tube.
For example, our analysis showed that despite a large amount of people (32%) changing at Oxford Circus when travelling from King’s Cross to Waterloo (as you might expect), there are a number of other ways people travel between the stations.
Our surveys of our customers’ journeys told us about the most popular ones. But our data revealed so many more intricate movement paths across the network – in fact, we observed at least 18 different route options. This insight is not only interesting, but useful, as it allows us to see what other routes people might take during disruption or planned improvement works.
Realising the full benefits
There are a number of benefits from collecting this data for customers. For example, our staff can notify travellers of the best routes to take if they wish to avoid disruption or unnecessary crowding. And potentially, we may be able to do this in real time, adapting our advice as conditions change.
We can also help customers with better journey planning on our website by providing them options to tailor routes better if they wish to avoid crowding. Teams operating our Tube network could have better information and a way to measure the impacts of disruption, so that we can feed that back into improving our operations and reliability.
By analysing the data, transport investment can be prioritised to improve services and be incorporated into future planning.
On top of this, we can use footfall volume data to help us with selling advertising on our poster units and with identifying shops for our retail units, which will help us raise more revenue to reinvest directly back into public transport.
The next steps
Due to the many benefits and improvements that resulted from the trial, we will now work with key stakeholders, privacy campaigners and consumer groups to examine what would be the best steps to take next, including possibly making data collection a permanent occurrence across the entire Tube network.
We will also be engaging with the ICO before any final decision is made. As more and more people travel by Tube, better understanding of where and how passengers travel is key so that we can ensure their journey is as reliable as possible. Using data wisely and responsibly to do this will improve journeys for everyone.
Top Image: traveler1116
FOR MORE INFORMATION
W: tfl.gov.uk