Let’s think for a moment about when we dive on the sofa in the evening to unwind and watch a good movie. Most of the time, we opt for a streaming service such as Netflix or Amazon Prime, rather than watch what is currently on air. And how many times have we played our favorite songs? We usually do this through Spotify, instead of the radio. All of these streaming services have a common feature: the recommendation. Movies similar to those we have watched and liked are displayed on Netflix’s home page with the aim of avoiding long search times for something new. These recommended movies are usually one’s we like, and the accuracy of these recommendations increase with the number of movie we watch.
The examples above highlight the commonality of recommendation systems and touches on how these systems can be implemented in a variety of ways. Along with movie and music streaming, another popular application category is related to fitness and training. So, why not develop a recommendation system for cycling routes? A recommendation system within a cycling application could be a beneficial way to give suggested routes to all types of cyclists – from occasional riders to heavy users. This feature would not only give cyclists new paths to take, but also promote new itineraries and encourage the use of bicycles. Encouraging more people to pick up cycling as an alternative means of transportation also benefits the environment.
Starting from a bike computer or a phone app that provides real time information about the cyclists ride, it is possible to collect data from a sensor, especially a GPS.
The whole recommendation system comes to life based on the data collected via GPS during the ride. This solution leverages not only user-produced data, but also public GPS tracks.
COLLECTING THE DATA
First, we start at the GPS and extract the important characteristics of the route. To do this, an analysis is performed comparing track data with detailed local maps. Then, we extract diversified information. Examples include the length, type of road, slope and many other variables (the data collected by the sensor, the presence on the route of shop, points of interest, or the real-time traffic on the road.)
Second, we will perform a data analysis to define the type of path. This will help define the different macro-variables that discriminate the choice of one type of path rather than another. The exploratory study conducted over 300 paths made by 8 frequent bicycle users has defined 2 different axes:
- Length of the path, from short (2 km) to long (80 km)
- Slope of the path, from complete flat land to high mountain roads
Starting from this discriminatory direction, it is possible to define the individual cyclist’s profile. The profile is based on the features, quantity, and the type of path that each of them have cycled. In the end, it is possible to classify each behavior. The following chart gives an example of the result of the profiles definition operation.
The above solution is similar to what Netflix does to classify movie preferences. However, Netflix lacks a feature that allows the streaming service to obtain more precise profiling: the ratings. Ratings allow them to give a different weight to each track and define what the user does and does not like. The ratings can be particularly useful information to understand if someone did not appreciate the last path taken, for example.
Adding these important features will allow us to better define user profiles and validate the quality of the recommended path. This will give us direct feedback once the suggested ride ends.
COSINE SIMILARITY AS METHOD OF RECOMMENDATION
Previously, we explained how the users profile is obtained and as how it can be improved by adding ratings. Now the next step is to find the right path to suggest to a user.
Let’s start by introducing the concept of cosine similarity. The concept may sound very complex, but it is actually a simple notion. The proposed solution is a collaborative one. The starting point is to define a set of similar cyclists, with the recommendation coming from similar profiles. In this way, it will later be possible to choose a suitable path by extracting it from paths made by similar users. We will base the recommendation not only on the path features, but also on the fact that has been created by a similar cyclist.
As stated before, cosine similarity is a method used to define how two cyclists are similar. The similarity is obtained by measuring the cosine of the angle between the two profiles’ vectors. We show this in the picture below. Therefore, the more similar the two profiles are, the lower the angle will be between them. Consequently, the cosine will assume values closer to 1, the highest possible.
A set of similar users is defined using this simple methodology. Obviously, each user has their own past activities, and the similar users’ activities will be used as base to extract the recommended path. To no surprise, this operation is also done through cosine similarity between each pair of paths. In the end, by running a short algorithm, the path with a higher score is chosen as the recommended path for the final user.
The concept behind this solution is to develop a tool to simplify the life and give more value to bike users. We want to suggest new and interesting paths as well as share the environmental benefits of using the bicycle.
The cycling application aims to stimulate the cooperation between riders and encourage shared activity. Through collaboration, the solution adds value by suggesting new paths tailored to each cyclist. The recommendation system could encourage more people to use a bicycle as transportation. Providing first-time users with new and interesting paths can encourage them to continually opt for a bicycle as their go to transportation. The more individuals choosing a bicycle for transportation instead of an automobile means fewer emissions. Therefore, in encouraging more cyclists, this application has the potential to contribute to the reduction of air pollution, with the end goal of a healthier world for the next generations.
Francesco Zocca – Data Scientist