Moving rocks around

Diego Quintana's blog

Plotting traffic maps in Barcelona


Introduction

The objective is to map the traffic information for May 2019. For that we will need to get the traffic information per street section, and to connect it with the information which contains the geometry of the street sections. We can download the traffic information for last month (May, 2019), on this url

https://opendata-ajuntament.barcelona.cat/data/en/dataset/trams/resource/d335dd1a-4cd0-4eac-acb7-5ae943bcd978

We can also download the geometry of the street sections (described in terms of latitude and longitude) from this url:

https://opendata-ajuntament.barcelona.cat/data/en/dataset/transit-relacio-trams

The final goal is to have a static (e.g. an image) or interactive map, which displays the traffic state intensity by street sections, over a context background map (e.g.: OpenStreetMap).

A short disclaimer before we begin: In catalan, tram means something like segment (of a street, in this context *ahem*). It's not a rail tram.

Hands on

We will try to get a quick sketch using only a subset of the data first, and I will be adding comments when necessary.

We first note that the files that we will be using are really big, and we can't afford to download them from a notebook cell each time. Since this notebook was created using the Google Colab Platform, we will have to find a way to make these files available from somewhere. Luckyily, Colab's API provides a Google Drive extension that solves this problem.

In [0]:
from google.colab import drive
drive.mount('/content/drive')
Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/drive
In [0]:
TRAFFIC_MAY_URL="/content/drive/My Drive/Colab Notebooks/trams-barcelona-data/2019_05_Maig_TRAMS_TRAMS.csv"
STREETS_MAPPING_URL="/content/drive/My Drive/Colab Notebooks/trams-barcelona-data/transit_relacio_trams.csv"
STREETS_MAPPING_URL_LONG="/content/drive/My Drive/Colab Notebooks/trams-barcelona-data/transit_relacio_trams_format_long.csv"
In [0]:
import pandas as pd
import matplotlib.pyplot as plt

# define default size of plots
plt.rcParams["figure.figsize"] = (20,3)

trf = pd.read_csv(TRAFFIC_MAY_URL, sep=",")
trf.head(10)
Out[0]:
idTram data estatActual estatPrevist
0 1 20190501000552 1 0
1 2 20190501000552 1 0
2 3 20190501000552 1 0
3 4 20190501000552 1 0
4 5 20190501000552 1 0
5 6 20190501000552 1 0
6 7 20190501000552 1 0
7 8 20190501000552 1 0
8 9 20190501000552 1 0
9 10 20190501000552 0 0
In [0]:
trf.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4637618 entries, 0 to 4637617
Data columns (total 4 columns):
idTram          int64
data            int64
estatActual     int64
estatPrevist    int64
dtypes: int64(4)
memory usage: 141.5 MB
In [0]:
old_size, _ = trf.shape

Some information about this dataset can be found in

https://opendata-ajuntament.barcelona.cat/data/en/dataset/trams

Here, we see that the columns EstatActual (current state) and EstatPrevist (forecasted state) have 7 possible values:

Value Meaning
0 No data
1 Very fluid (Light)
2 Fluid
3 Dense
4 Very Dense
5 Congestion
6 Line closed

Our first task is to decide what to do with those values with 0 and 6. For the sake of these examples, we will drop these rows. In your case, you might want to work only with these rows, or something in the middle. Your call.

In [0]:
# clean trf fr
trf = trf[
    ~(
        trf['estatActual'].isin([0, 6]) | trf['estatPrevist'].isin([0, 6])
    )
]

trf.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2257140 entries, 36 to 4637614
Data columns (total 4 columns):
idTram          int64
data            int64
estatActual     int64
estatPrevist    int64
dtypes: int64(4)
memory usage: 86.1 MB
In [0]:
new_size, _ = trf.shape
In [0]:
# print the amount of rows that were dropped
print(f"{old_size - new_size} of {old_size} rows were dropped ({(old_size-new_size)/old_size:0%})")
2380478 of 4637618 rows were dropped (51.329756%)
In [0]:
streets = pd.read_csv(STREETS_MAPPING_URL, sep=",")
streets.head(10)
Out[0]:
Tram Descripció Coordenades
0 1 Diagonal (Ronda de Dalt a Doctor Marañón) 2.11203535639414,41.3841912394771,2.1015028628...
1 2 Diagonal (Doctor Marañón a Ronda de Dalt) 2.111944376806616,41.38446666680338,2.10159408...
2 3 Diagonal (Doctor Marañón a Pl. Pius XII) 2.112093343037027,41.38422850920645,2.12264979...
3 4 Diagonal (Pl. Pius XII a Doctor Marañón) 2.122592049318304,41.38719094189204,2.11196902...
4 5 Diagonal (Pl. Pius XII a Pl. Maria Cristina) 2.122657659295115,41.38694195794678,2.12755961...
5 6 Diagonal (Pl. Maria Cristina a Pl. Pius XII) 2.127445471496177,41.38841486943499,2.12260182...
6 7 Diagonal (Pl. Maria Cristina a Numància) 2.132701524574889,41.38946673844387,2.12754890...
7 8 Diagonal (Numància a Pl. Maria Cristina) 2.132462297581474,41.38967891982529,2.12746219...
8 9 Diagonal (Entença a Francesc Macià) 2.137322638833081,41.39061820620348,2.14394391...
9 10 Diagonal (Francesc Macià a Entença) 2.144060999786239,41.39265861540969,2.13711452...
In [0]:
streets.dtypes
Out[0]:
Tram            int64
Descripció     object
Coordenades    object
dtype: object

Manipulating WKT data with shapely

In [0]:
streets["Coordenades"].iloc[1]
Out[0]:
'2.111944376806616,41.38446666680338,2.101594089443895,41.38186790291103'

Here we note that the coordinates are in chunks of text and there's no encoding explicited (I didn't find any on the official information). We need to do some research.

But before diving in, let's check for missing data.

In [0]:
# we check for possible missing data
streets["Coordenades"].isna().sum()
Out[0]:
0
In [0]:
streets["Coordenades"].isnull().sum()
Out[0]:
0
In [0]:
expanded_coords = streets["Coordenades"].str.split(",", expand=True)
expanded_coords.head()
Out[0]:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
0 2.11203535639414 41.3841912394771 2.101502862881051 41.3816307921222 None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None
1 2.111944376806616 41.38446666680338 2.101594089443895 41.38186790291103 None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None
2 2.112093343037027 41.38422850920645 2.122649794166167 41.38692960796057 None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None
3 2.122592049318304 41.38719094189204 2.111969021588291 41.38445704285778 None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None
4 2.122657659295115 41.38694195794678 2.127559612359478 41.38817909013056 None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None
In [0]:
len(expanded_coords.columns)
Out[0]:
76

We note that the Coordenades column could be expanded into ~76 new columns, if we consider sep="," as a separator. From looking at this data, it's probably in the WKT format (Well-Known Text representation of geometry) and it represents a line, according to the LINESTRING representation.

We'll need to convert each row to match this WKT format, using string manipulation. But we don't want to reinvent the wheel, so we will be using a python package called Shapely developed specifically for these types of tasks.

In [0]:
!pip install shapely
Requirement already satisfied: shapely in /usr/local/lib/python3.6/dist-packages (1.6.4.post2)
In [0]:
sample = streets["Coordenades"].str.split(",")[0]
sample
Out[0]:
['2.11203535639414',
 '41.3841912394771',
 '2.101502862881051',
 '41.3816307921222']

Reshaping the strings

A LINESTRING is defined in terms of pairs of values. We need to take a list of length $n$, (and assume this number is even), and reshape it as a list of tuples.

[a, b, c, d, e, f, g] -> [(a, b), (c, d), (f, g),]
In [0]:
import typing as T

from itertools import zip_longest

# not tested
def reshape_line(line: T.List[str], chunksize: int, fillvalue: int) -> T.Iterable[T.Tuple[float]]:
    """Reshape a line to match the LineString WKT format
    
    This is based on `zip_longest`, read more in 
    <https://docs.python.org/3/library/itertools.html#itertools.zip_longest> and
    in the StackOverflow solution posted in <https://stackoverflow.com/a/434411/5819113>
    
    
        zip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
       
    """
    args = [iter(float(el) for el in line)] * chunksize
    return zip_longest(*args, fillvalue=fillvalue)
In [0]:
list(reshape_line(sample, chunksize=2, fillvalue=0))
Out[0]:
[(2.11203535639414, 41.3841912394771), (2.101502862881051, 41.3816307921222)]

With this, we can get our very first drawing. Baby steps.

In [0]:
from shapely.geometry import LineString

sample_wkt = LineString(reshape_line(sample, 2, 0))
sample_wkt
Out[0]:

The next step is to apply this process to every row on the column Coordenades. We will implement a helper function for this purpose.

In [0]:
def convert_line_to_wkt(line: str) -> LineString:
    splitted_line = line.split(",")
    reshaped_line = reshape_line(splitted_line, chunksize=2, fillvalue=None)
    return LineString(reshaped_line)
In [0]:
# let's process the whole column
coords_wkt = streets["Coordenades"].apply(lambda line: convert_line_to_wkt(line))

# access the __str__ of the first element
print(coords_wkt[1])
LINESTRING (2.111944376806616 41.38446666680338, 2.101594089443895 41.38186790291103)

With the values in WKT format, we could use an online WKT viewer like http://arthur-e.github.io/Wicket/sandbox-gmaps3.html to see how this is rendered on a map.

First plot using geopandas

The former was the first approach. We will try now our next step: plot everything at once. Whatever gets displayed will be considered success.

In [0]:
!pip install geopandas
Collecting geopandas
  Downloading https://files.pythonhosted.org/packages/21/80/da2a33c9201cd4ce693f4aa6189efc9ef1a48bec1c3b02c3ce9908b07fec/geopandas-0.5.1-py2.py3-none-any.whl (893kB)
     |████████████████████████████████| 901kB 5.1MB/s 
Requirement already satisfied: pandas in /usr/local/lib/python3.6/dist-packages (from geopandas) (0.24.2)
Requirement already satisfied: shapely in /usr/local/lib/python3.6/dist-packages (from geopandas) (1.6.4.post2)
Collecting fiona (from geopandas)
  Downloading https://files.pythonhosted.org/packages/89/4a/193cd6a75e51062c85f4e1cd6f312b3bbda6e26ba7510f152ef5016f0b16/Fiona-1.8.6-cp36-cp36m-manylinux1_x86_64.whl (17.9MB)
     |████████████████████████████████| 17.9MB 41.2MB/s 
Collecting pyproj (from geopandas)
  Downloading https://files.pythonhosted.org/packages/16/59/43869adef45ce4f1cf7d5c3aef1ea5d65d449050abdda5de7a2465c5729d/pyproj-2.2.1-cp36-cp36m-manylinux1_x86_64.whl (11.2MB)
     |████████████████████████████████| 11.2MB 22.2MB/s 
Requirement already satisfied: pytz>=2011k in /usr/local/lib/python3.6/dist-packages (from pandas->geopandas) (2018.9)
Requirement already satisfied: numpy>=1.12.0 in /usr/local/lib/python3.6/dist-packages (from pandas->geopandas) (1.16.4)
Requirement already satisfied: python-dateutil>=2.5.0 in /usr/local/lib/python3.6/dist-packages (from pandas->geopandas) (2.5.3)
Requirement already satisfied: click<8,>=4.0 in /usr/local/lib/python3.6/dist-packages (from fiona->geopandas) (7.0)
Collecting click-plugins>=1.0 (from fiona->geopandas)
  Downloading https://files.pythonhosted.org/packages/e9/da/824b92d9942f4e472702488857914bdd50f73021efea15b4cad9aca8ecef/click_plugins-1.1.1-py2.py3-none-any.whl
Requirement already satisfied: attrs>=17 in /usr/local/lib/python3.6/dist-packages (from fiona->geopandas) (19.1.0)
Collecting munch (from fiona->geopandas)
  Downloading https://files.pythonhosted.org/packages/68/f4/260ec98ea840757a0da09e0ed8135333d59b8dfebe9752a365b04857660a/munch-2.3.2.tar.gz
Requirement already satisfied: six>=1.7 in /usr/local/lib/python3.6/dist-packages (from fiona->geopandas) (1.12.0)
Collecting cligj>=0.5 (from fiona->geopandas)
  Downloading https://files.pythonhosted.org/packages/e4/be/30a58b4b0733850280d01f8bd132591b4668ed5c7046761098d665ac2174/cligj-0.5.0-py3-none-any.whl
Building wheels for collected packages: munch
  Building wheel for munch (setup.py) ... done
  Created wheel for munch: filename=munch-2.3.2-py2.py3-none-any.whl size=6613 sha256=5157e6f60cca69d1d04d0bb4cc14e5d53b0702ddfe97d8af306c747f4137671d
  Stored in directory: /root/.cache/pip/wheels/db/bf/bc/06a3e1bfe0ab27d2e720ceb3cff3159398d92644c0cec2c125
Successfully built munch
Installing collected packages: click-plugins, munch, cligj, fiona, pyproj, geopandas
Successfully installed click-plugins-1.1.1 cligj-0.5.0 fiona-1.8.6 geopandas-0.5.1 munch-2.3.2 pyproj-2.2.1
In [0]:
from geopandas import GeoDataFrame

crs = {'init': 'epsg:4326'}
gdf = GeoDataFrame(streets, crs=crs, geometry=coords_wkt)
ax = gdf.plot(cmap="viridis")

Note that here, the colormap viridis is not supposed to make sense. It is not giving any useful information out of the box. But we'll get to that in a minute.

Tweaking the plot

Hooray! We got a map. But it's ugly and it doesn't say much.

At this point we could start tweaking the plot using the matplotlib API. We could add legends, meaningful colours, different line styles, etc. You get it. And if you have played with matplotlib long enough, you'll know that almost everything is possible, but not easy to get.

In our case we don't need to produce beautiful plots at this point, we need to provide an efficient way to explore the data visually.

For maps, the best tool to do this IMHO is to use leaflet, but we don't want to write javascript (I don't want to). We will be using Folium, which is a library for using leaflet in python.

Plotting routes on top of Barcelona using Folium

In [0]:
!pip install folium
Requirement already satisfied: folium in /usr/local/lib/python3.6/dist-packages (0.8.3)
Requirement already satisfied: numpy in /usr/local/lib/python3.6/dist-packages (from folium) (1.16.4)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.6/dist-packages (from folium) (2.10.1)
Requirement already satisfied: six in /usr/local/lib/python3.6/dist-packages (from folium) (1.12.0)
Requirement already satisfied: branca>=0.3.0 in /usr/local/lib/python3.6/dist-packages (from folium) (0.3.1)
Requirement already satisfied: requests in /usr/local/lib/python3.6/dist-packages (from folium) (2.21.0)
Requirement already satisfied: MarkupSafe>=0.23 in /usr/local/lib/python3.6/dist-packages (from jinja2->folium) (1.1.1)
Requirement already satisfied: urllib3<1.25,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from requests->folium) (1.24.3)
Requirement already satisfied: idna<2.9,>=2.5 in /usr/local/lib/python3.6/dist-packages (from requests->folium) (2.8)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.6/dist-packages (from requests->folium) (2019.6.16)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests->folium) (3.0.4)
In [0]:
import os
import folium

print(folium.__version__)
0.8.3
In [0]:
barcelona_map = folium.Map([41.3947,2.1557], zoom_start=12.4, tiles='cartodbpositron')
folium.GeoJson(gdf).add_to(barcelona_map)
# plot map
barcelona_map
Out[0]:

Plotting traffic data

This map is plain and boring, because it doesn't contain any information from the traffic dataset. Let's fix that.

We would like to achieve the following:

  • To draw the lines from Coord_wkt on top of a map of Barcelona
  • To plot the values from the trf dataframe, showing how busy is the line at a certain moment, defined by the column data

We note that the traffic data has a temporal component in data, which will benefit greatly from using the correct datatype, i.e. we will try to use the DateTime type over the Object type.

In [0]:
trf["data"].head()
Out[0]:
36     20190501000552
56     20190501000552
57     20190501000552
143    20190501000552
276    20190501000552
Name: data, dtype: int64

We can convert this column to datetime (https://pandas.pydata.org/pandas-docs/version/0.20/generated/pandas.to_datetime.html) following the strftime format specifications:

In [0]:
# https://pandas.pydata.org/pandas-docs/version/0.20/generated/pandas.to_datetime.html
pd.to_datetime(trf["data"].astype(str)[:5], format='%Y%m%d%H%M%S')
Out[0]:
36    2019-05-01 00:05:52
56    2019-05-01 00:05:52
57    2019-05-01 00:05:52
143   2019-05-01 00:05:52
276   2019-05-01 00:05:52
Name: data, dtype: datetime64[ns]
In [0]:
trf["data_datetime"] = pd.to_datetime(trf["data"].astype(str), format='%Y%m%d%H%M%S')
In [0]:
trf.columns
Out[0]:
Index(['idTram', 'data', 'estatActual', 'estatPrevist', 'data_datetime'], dtype='object')
In [0]:
trf.head(5)
Out[0]:
idTram data estatActual estatPrevist data_datetime
36 37 20190501000552 4 5 2019-05-01 00:05:52
56 57 20190501000552 1 5 2019-05-01 00:05:52
57 58 20190501000552 1 5 2019-05-01 00:05:52
143 144 20190501000552 1 5 2019-05-01 00:05:52
276 277 20190501000552 2 5 2019-05-01 00:05:52

Quick survey on the data

How busy are trams compared to the expected traffic, on average, for the month of May?

As an extra, we'll try to assess the forecasts: were they right? We'll consider a simple rule of thumb

If the forecasted traffic is greater than the current traffic, then the forecast is right

This is one approach, and I agree it is rough. There are multiple ways of asessing the error in predictions, but here's not the place to discuss them.

In [0]:
# caveat: this is an *aggregated* measure of the whole month, but it provides
# a rough measure

monthly_traffic = trf.groupby(by=['idTram'], as_index=True).mean()
monthly_traffic["diff_perc"] = monthly_traffic["estatActual"]/monthly_traffic["estatPrevist"]*100
monthly_traffic["was_forecast_right"] = monthly_traffic["diff_perc"] <= 100
monthly_traffic.head()
Out[0]:
data estatActual estatPrevist diff_perc was_forecast_right
idTram
1 2.019052e+13 2.127508 2.125576 100.090890 False
2 2.019052e+13 1.809214 1.874505 96.516888 True
3 2.019052e+13 1.985931 2.011511 98.728342 True
4 2.019052e+13 2.262626 2.305628 98.134936 True
5 2.019052e+13 2.744457 2.737290 100.261860 False

How busy were trams on average, on an hour basis, per tram line, during the month of May?

In [0]:
# line continuation according to PEP8 
# see https://stackoverflow.com/a/17630918/5819113

trams_hourly = (
    # drop unwanted columns
    trf.drop(columns=['data', 'idTram'])
        # group by desired columns
        .groupby(by=[trf['idTram'], trf["data_datetime"].dt.day], as_index=True)
        # show the mean of the aggregated values
        .mean()
)

trams_hourly["diff_perc"] = trams_hourly["estatActual"]/trams_hourly["estatPrevist"]*100
trams_hourly["was_forecast_right"] = trams_hourly["diff_perc"] <= 100
trams_hourly.head(24) # first 24 hours
Out[0]:
estatActual estatPrevist diff_perc was_forecast_right
idTram data_datetime
1 1 1.598131 1.696262 94.214876 True
2 2.178082 2.091324 104.148472 False
3 2.452915 2.313901 106.007752 False
4 1.771300 1.793722 98.750000 True
5 1.648649 1.675676 98.387097 True
6 2.256881 2.215596 101.863354 False
7 2.261538 2.230769 101.379310 False
8 2.278302 2.264151 100.625000 False
9 2.255814 2.241860 100.622407 False
10 2.451613 2.479263 98.884758 True
11 1.724138 1.719828 100.250627 False
12 1.654867 1.681416 98.421053 True
13 2.253333 2.235556 100.795229 False
14 2.197183 2.169014 101.298701 False
15 2.459016 2.420765 101.580135 False
16 2.386935 2.351759 101.495726 False
17 2.384615 2.380090 100.190114 False
18 1.758772 1.771930 99.257426 True
19 1.583710 1.628959 97.222222 True
20 2.326829 2.321951 100.210084 False
21 2.160194 2.223301 97.161572 True
22 2.179487 2.215385 98.379630 True
23 2.385650 2.426009 98.336414 True
24 2.563319 2.502183 102.443281 False

Baking everything together

Let's merge dataframes and see what we get. This is my first time using these libraries, so my code has a lot of comments that hopefully explain what I'm doing. I've taken a lot from the examples given in the folium repo, so please go there to see what else can be done!

In [0]:
# join dataframes, to *see* what happens. But make a copy first.
_trf_to_merge = trf.copy()
_streets_to_merge = streets.copy()
In [0]:
# merge both dataframes using the id of the tram as common column
_merged = _trf_to_merge.merge(_streets_to_merge, left_on="idTram", right_on="Tram")
# drop unnecesary columns and set the idTram
_merged = _merged.drop(columns=["data", "Tram", "Coordenades"])
_merged.head()
Out[0]:
idTram estatActual estatPrevist data_datetime Descripció geometry
0 37 4 5 2019-05-01 00:05:52 Creu Coberta - Sants (Gran Via Carles III a Ri... LINESTRING (2.121871003054767 41.3757077900903...
1 37 3 5 2019-05-01 00:00:53 Creu Coberta - Sants (Gran Via Carles III a Ri... LINESTRING (2.121871003054767 41.3757077900903...
2 37 3 5 2019-05-01 00:10:53 Creu Coberta - Sants (Gran Via Carles III a Ri... LINESTRING (2.121871003054767 41.3757077900903...
3 37 2 3 2019-05-01 00:15:52 Creu Coberta - Sants (Gran Via Carles III a Ri... LINESTRING (2.121871003054767 41.3757077900903...
4 37 3 3 2019-05-01 00:20:52 Creu Coberta - Sants (Gran Via Carles III a Ri... LINESTRING (2.121871003054767 41.3757077900903...
In [0]:
# `estatActual` and `estatPrevist` vary from 1 to 5
_merged.describe()
Out[0]:
idTram estatActual estatPrevist
count 2.125642e+06 2.125642e+06 2.125642e+06
mean 2.520718e+02 1.896606e+00 1.925726e+00
std 1.606364e+02 8.496675e-01 9.009781e-01
min 1.000000e+00 1.000000e+00 1.000000e+00
25% 1.170000e+02 1.000000e+00 1.000000e+00
50% 2.240000e+02 2.000000e+00 2.000000e+00
75% 4.080000e+02 2.000000e+00 2.000000e+00
max 5.340000e+02 5.000000e+00 5.000000e+00
In [0]:
_merged.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2125642 entries, 0 to 2125641
Data columns (total 6 columns):
idTram           int64
estatActual      int64
estatPrevist     int64
data_datetime    datetime64[ns]
Descripció       object
geometry         object
dtypes: datetime64[ns](1), int64(3), object(2)
memory usage: 113.5+ MB

Plotting combined data

Grouping information by day

In [0]:
from geopandas import GeoDataFrame

crs = {'init': 'epsg:4326'}
merged_gdf = GeoDataFrame(_merged, crs=crs, geometry=_merged['geometry'])
In [0]:
# create a colormap using branca
# following the examples from
# https://nbviewer.jupyter.org/github/python-visualization/folium/blob/master/examples/plugin-Search.ipynb

import branca

colorscale = branca.colormap.linear.viridis.scale(1, 5)
colorscale.caption="Tram usage level"


def style_function(feature):
    estat_actual = int(feature['properties']['estatActual']) or None
    return {
        'alpha': 0.5,
        'weight': 3,
        'color': '#black' if estat_actual is None else colorscale(estat_actual)
    }
In [0]:
# we will manipulate the dataframe to plot data using time scales, 
# since plotting the whole dataframe it's too expensive memory-wise

first_10_am = merged_gdf[(merged_gdf["data_datetime"].dt.day == 1) & (merged_gdf["data_datetime"].dt.hour == 10)]
In [0]:
# we need to drop this column because it raises an error
first_10_am = first_10_am.drop(columns="data_datetime", inplace=False)
In [0]:
# plot the map
barca_map_colors = folium.Map([41.3947,2.1557], zoom_start=12.4, tiles='cartodbpositron')

(
    folium.GeoJson(
        first_10_am,
        style_function=style_function)
    .add_to(barca_map_colors)
)

# add legend, see <https://github.com/python-visualization/folium/issues/528>
barca_map_colors.add_child(colorscale)

# plot map
barca_map_colors
Out[0]:
In [0]:
# we will manipulate the dataframe to plot data using time scales

gdf_hourly = (
        # group by desired columns
        merged_gdf.groupby(
            by=[merged_gdf['idTram'],
                merged_gdf["data_datetime"].dt.day, # here is where the grouping happens
               ],
            as_index=True)
        # show the mean of the aggregated values
        .mean()
        .reset_index()
)

gdf_hourly["diff_perc"] = gdf_hourly["estatActual"]/gdf_hourly["estatPrevist"]*100
gdf_hourly["was_forecast_right"] = gdf_hourly["diff_perc"] <= 100
gdf_hourly_w_geometry = gdf_hourly.merge(streets, left_on="idTram", right_on="Tram")
gdf_hourly_w_geometry.tail()
Out[0]:
idTram data_datetime estatActual estatPrevist diff_perc was_forecast_right Tram Descripció Coordenades geometry
10281 534 27 1.883392 1.886926 99.812734 True 534 Ronda Litoral (Nus de la Trinitat a Potosí) 2.206062333803802,41.44356616311283,2.20196194... LINESTRING (2.206062333803802 41.4435661631128...
10282 534 28 1.981413 2.144981 92.374350 True 534 Ronda Litoral (Nus de la Trinitat a Potosí) 2.206062333803802,41.44356616311283,2.20196194... LINESTRING (2.206062333803802 41.4435661631128...
10283 534 29 1.929577 1.978873 97.508897 True 534 Ronda Litoral (Nus de la Trinitat a Potosí) 2.206062333803802,41.44356616311283,2.20196194... LINESTRING (2.206062333803802 41.4435661631128...
10284 534 30 1.851064 1.918440 96.487985 True 534 Ronda Litoral (Nus de la Trinitat a Potosí) 2.206062333803802,41.44356616311283,2.20196194... LINESTRING (2.206062333803802 41.4435661631128...
10285 534 31 2.017544 2.056140 98.122867 True 534 Ronda Litoral (Nus de la Trinitat a Potosí) 2.206062333803802,41.44356616311283,2.20196194... LINESTRING (2.206062333803802 41.4435661631128...

Plot daily average values

  • We are using a discrete colorscale that ranges from 1 to 6, to indicate the mean of the estatActual for each Tram line on May 27 (The color selection is arbitrary)
  • We are drawing the lines whose capacity forecast was right (e.g. the actual capacity was below the forecasted value) with a dashed line, and those lines whose value was wrong (e.g. the capacity forecast fell short on that day) with a continuous line. This is because of the limited styles that leaflet provides out of the box, and any work on improving this style had to be done with CSS, which is out of the scope of this challenge.
In [0]:
import branca
from folium.plugins import Search
from geopandas import GeoDataFrame

# produce the geopandas dataframe
crs = {'init': 'epsg:4326'}
gdf_hourly = GeoDataFrame(
    gdf_hourly_w_geometry[gdf_hourly_w_geometry['data_datetime']==27], # here is where the filtering happens
    crs=crs,
    geometry=gdf_hourly_w_geometry['geometry']
)

# create colorscale
colorscale = branca.colormap.step.Accent_06.scale(1, 5)
colorscale.caption="estatActual (Daily average)"

# define a method for styling lines
def style_function_capacity(feature):
    not_exceeded = int(feature['properties']['was_forecast_right']) or None
    estat_actual = int(feature['properties']['estatActual']) or None
    
    # see for other options https://leafletjs.com/reference-1.5.0.html#path-option
    return {
        'alpha': 0.5,
        'dashArray': '5,10' if not_exceeded else '0', # dashed line unless forecast is wrong
        'weight': 3,
        'color': '#black' if estat_actual is None else colorscale(estat_actual),
    }
  
# plot the map of barcelona
hourly_map = folium.Map([41.3947,2.1557], zoom_start=12.4, tiles='cartodbpositron')

# plot the tram lines
tram_map = (
    folium.GeoJson(
        gdf_hourly,
        name="Barcelona Tram Capacity (27 May 2019)",
        style_function=style_function_capacity,
        tooltip=folium.GeoJsonTooltip(
            fields=['idTram', 'estatActual', 'estatPrevist', 'diff_perc', 'Descripció'],
            aliases=['Tram ID', 'Observed Capacity', 'Forecasted Capacity', 'Load (%)', 'Description'], 
            localize=True)
    )
    .add_to(hourly_map)
)

# add legend, see <https://github.com/python-visualization/folium/issues/528>
hourly_map.add_child(colorscale)

# add a searchbox bound to the Description of lines
statesearch = Search(
    layer=tram_map,
    geom_type='Line',
    placeholder='Search for a Tram Description',
    collapsed=False,
    search_label='Descripció',
    weight=3
).add_to(hourly_map)


# plot map
hourly_map
Out[0]:

Final words

Thanks for reading until here! Regarding the data, I've explored manually isolated cases, so my conclusions need a bit of work. I'll refer only to the last map being shown (May 27).

  • It's not common to observe capacity levels over 3
  • Traffic loads range from 90 to 110%. Forecasts are really tight but it seems there's no need for accounting for more.
  • It may be worth exploring plots of diff_perc and other error measurements.

It's easy to extend these maps to any time of the day, using the dt accessor and using the data_datetime as a pivot for indexing.

My intention with this notebook is to show the power of leaflet and python, and the libraries available to manipulate geocoded data like geopandas and shapely. I've found that there is an application that uses this data, check https://com-shi-va.barcelona.cat/en/transit-

Until next time.