Applying data science to improve mobility users' experience

It might not be a surprise if someone tells you that, like it or not, we are constantly surrounded by data. The technological advances of recent decades have brought the Internet to portable devices that are regularly used by many people. In April 2020, about 4.57 billion people (59% of the world’s population) had access to the Internet, and it is estimated that each person generates about 1.7 MB of data every second [1]. Compare this figure with the memory storage of our everyday electronic devices and it is not surprising that some people believe we are living in the era of Big Data. Everywhere we go, everything we buy is susceptible to being recorded. In this sense, data have shed light on the desires of consumers, which is the main reason many companies consider data science to be a key factor in their strategic business plans.

Faced with the challenge of dealing with large empirical datasets, the field of statistics plays an important role. In the first preliminary step in the data analysis procedure, data mining algorithms provide techniques to properly handle such large datasets. Although the aforementioned numbers are truly astonishing, it is important to note that it is not possible to explore the entire “data universe”: data that have not yet been collected are lost and incoming data are simply unknown. Therefore, statistical analysis is always restricted to a particular dataset or sample encompassing a limited number of events. The next step in the data analysis procedure lies in merely describing this sample to extract the crucial Key Performance Indicators (KPIs) for any business. But how reliable is the information obtained from this sample to characterize the whole “data universe"?

One of the most important steps in statistical analysis is based on inferring the properties of the whole - population in statistical terms - based on the available information in a sample. Contrary to the logic statements in mathematics, where “A implies B” the generic statements formulated in statistical inference are characterized by a probability of occurrence. So, “with a probability of 60%, A implies B”. The lack of information, together with the inherent complexity of the system one is dealing with, makes it impossible to make an assertion without hesitation. In this context, probability distributions are essential mathematical tools that relate probabilities of occurrence to different events.

The main body of the probability distribution is associated with the most common events, which have a relatively high probability of occurrence. However, one might be interested in those extreme events whose appearance could bring dramatic consequences for the performance of a business. For instance, in the mobility field, it is very unlikely a user will wait more than three hours to be picked up by a bus. However, if it happens the user’s experience will be extremely negative. Information on extreme events is included in the tail of these probability distributions and, although the corresponding probability might be very small in comparison with common events, they have a non-zero probability to occur. Extreme value theory is the field of probability which studies extreme events and their probabilities of occurrence [2]. Within the framework of this theory, it is quite common to distinguish between light tails, whose probability is relatively low in comparison with the body of the distribution, and heavy tails, whose probability is relatively high.

Shotl is constantly taking advantage of rigorous data science techniques to improve the user experience. Our routing algorithms are designed in such a way that the distributions of our KPIs do not exhibit heavy tails. This implies that the probability of occurrence of an extreme event is as low as possible. Different strategies can be adopted to avoid these dangerous heavy tails. For example, the algorithm will try to choose an optimal route by minimizing those KPIs considered negative for the user experience. In addition, based on the results of simulations, it is possible to tune some parameters of the algorithm’s configuration to avoid the appearance of extreme events as much as possible.

References

[1] Domo: Data Never Sleeps 8.0,

https://www.domo.com/learn/data-never-sleeps-8 (2020)

[2] De Haan, L. and Ferreira, A., Extreme value theory: an introduction, Springer Science & Business Media, (2007)

Popular posts

Read more

26.09.22

Shotl participates at EIT Urban Mobility’s “Disrupting MaaS” course

As of October 2022, Shotl will participate in EIT Urban Mobility’s “Disrupting MaaS” online course. We’ll be sharing our knowledge of MaaS systems integrations, as well as use cases in Japan and Scotland.


Jonàs Ramírez
Read more

28.06.21

Shotl launches its most ambitious operation to date

As of June 2021, people in the Scottish Highlands are connected by a new and innovative On-Demand Transport system.


Gerard Martret
Read more

31.05.21

Calaf pioneers smart digital solutions to public transport

Located in the Alta Segarra region in the rural interior of Catalonia, Spain, Calaf has 3,000 inhabitants and is the main town in an area with 13 villages with fewer than 600 inhabitants each.


Adrià Ramírez
;
Subscribe to our Newsletter