Introduction¶

The objective is to map the traffic information for May 2019. For that we will need to get the traffic information per street section, and to connect it with the information which contains the geometry of the street sections. We can download the traffic information for last month (May, 2019), on this url

https://opendata-ajuntament.barcelona.cat/data/en/dataset/trams/resource/d335dd1a-4cd0-4eac-acb7-5ae943bcd978

We can also download the geometry of the street sections (described in terms of latitude and longitude) from this url:

https://opendata-ajuntament.barcelona.cat/data/en/dataset/transit-relacio-trams

The final goal is to have a static (e.g. an image) or interactive map, which displays the traffic state intensity by street sections, over a context background map (e.g.: OpenStreetMap).

A short disclaimer before we begin: In catalan, tram means something like segment (of a street, in this context *ahem*). It's not a rail tram.

Hands on¶

We will try to get a quick sketch using only a subset of the data first, and I will be adding comments when necessary.

We first note that the files that we will be using are really big, and we can't afford to download them from a notebook cell each time. Since this notebook was created using the Google Colab Platform, we will have to find a way to make these files available from somewhere. Luckyily, Colab's API provides a Google Drive extension that solves this problem.

In [0]:

from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/drive

In [0]:

TRAFFIC_MAY_URL="/content/drive/My Drive/Colab Notebooks/trams-barcelona-data/2019_05_Maig_TRAMS_TRAMS.csv"
STREETS_MAPPING_URL="/content/drive/My Drive/Colab Notebooks/trams-barcelona-data/transit_relacio_trams.csv"
STREETS_MAPPING_URL_LONG="/content/drive/My Drive/Colab Notebooks/trams-barcelona-data/transit_relacio_trams_format_long.csv"

In [0]:

import pandas as pd
import matplotlib.pyplot as plt

# define default size of plots
plt.rcParams["figure.figsize"] = (20,3)

trf = pd.read_csv(TRAFFIC_MAY_URL, sep=",")
trf.head(10)

Out[0]:

	idTram	data	estatActual
0	1	20190501000552	1
1	2	20190501000552	1
2	3	20190501000552	1
3	4	20190501000552	1
4	5	20190501000552	1
5	6	20190501000552	1
6	7	20190501000552	1
7	8	20190501000552	1
8	9	20190501000552	1
9	10	20190501000552	0

In [0]:

trf.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4637618 entries, 0 to 4637617
Data columns (total 4 columns):
idTram          int64
data            int64
estatActual     int64
estatPrevist    int64
dtypes: int64(4)
memory usage: 141.5 MB

In [0]:

old_size, _ = trf.shape

Some information about this dataset can be found in

https://opendata-ajuntament.barcelona.cat/data/en/dataset/trams

Here, we see that the columns EstatActual (current state) and EstatPrevist (forecasted state) have 7 possible values:

Value	Meaning
0	No data
1	Very fluid (Light)
2	Fluid
3	Dense
4	Very Dense
5	Congestion
6	Line closed

Our first task is to decide what to do with those values with 0 and 6. For the sake of these examples, we will drop these rows. In your case, you might want to work only with these rows, or something in the middle. Your call.

In [0]:

# clean trf fr
trf = trf[
    ~(
        trf['estatActual'].isin([0, 6]) | trf['estatPrevist'].isin([0, 6])
    )
]

trf.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2257140 entries, 36 to 4637614
Data columns (total 4 columns):
idTram          int64
data            int64
estatActual     int64
estatPrevist    int64
dtypes: int64(4)
memory usage: 86.1 MB

In [0]:

new_size, _ = trf.shape

In [0]:

# print the amount of rows that were dropped
print(f"{old_size - new_size} of {old_size} rows were dropped ({(old_size-new_size)/old_size:0%})")

2380478 of 4637618 rows were dropped (51.329756%)

In [0]:

streets = pd.read_csv(STREETS_MAPPING_URL, sep=",")
streets.head(10)

Out[0]:

	Tram	Descripció	Coordenades
0	1	Diagonal (Ronda de Dalt a Doctor Marañón)	2.11203535639414,41.3841912394771,2.1015028628...
1	2	Diagonal (Doctor Marañón a Ronda de Dalt)	2.111944376806616,41.38446666680338,2.10159408...
2	3	Diagonal (Doctor Marañón a Pl. Pius XII)	2.112093343037027,41.38422850920645,2.12264979...
3	4	Diagonal (Pl. Pius XII a Doctor Marañón)	2.122592049318304,41.38719094189204,2.11196902...
4	5	Diagonal (Pl. Pius XII a Pl. Maria Cristina)	2.122657659295115,41.38694195794678,2.12755961...
5	6	Diagonal (Pl. Maria Cristina a Pl. Pius XII)	2.127445471496177,41.38841486943499,2.12260182...
6	7	Diagonal (Pl. Maria Cristina a Numància)	2.132701524574889,41.38946673844387,2.12754890...
7	8	Diagonal (Numància a Pl. Maria Cristina)	2.132462297581474,41.38967891982529,2.12746219...
8	9	Diagonal (Entença a Francesc Macià)	2.137322638833081,41.39061820620348,2.14394391...
9	10	Diagonal (Francesc Macià a Entença)	2.144060999786239,41.39265861540969,2.13711452...

In [0]:

streets.dtypes

Out[0]:

Tram            int64
Descripció     object
Coordenades    object
dtype: object

Manipulating WKT data with shapely¶

In [0]:

streets["Coordenades"].iloc[1]

Out[0]:

'2.111944376806616,41.38446666680338,2.101594089443895,41.38186790291103'

Here we note that the coordinates are in chunks of text and there's no encoding explicited (I didn't find any on the official information). We need to do some research.

But before diving in, let's check for missing data.

In [0]:

# we check for possible missing data
streets["Coordenades"].isna().sum()

Out[0]:

In [0]:

streets["Coordenades"].isnull().sum()

Out[0]:

In [0]:

expanded_coords = streets["Coordenades"].str.split(",", expand=True)
expanded_coords.head()

Out[0]:

	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21	22	23	24	25	26	27	28	29	30	31	32	33	34	35	36	37	38	39	40	41	42	43	44	45	46	47	48	49	50	51	52	53	54	55	56	57	58	59	60	61	62	63	64	65	66	67	68	69	70	71	72	73	74	75
0	2.11203535639414	41.3841912394771	2.101502862881051	41.3816307921222	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None
1	2.111944376806616	41.38446666680338	2.101594089443895	41.38186790291103	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None
2	2.112093343037027	41.38422850920645	2.122649794166167	41.38692960796057	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None
3	2.122592049318304	41.38719094189204	2.111969021588291	41.38445704285778	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None
4	2.122657659295115	41.38694195794678	2.127559612359478	41.38817909013056	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None

In [0]:

len(expanded_coords.columns)

Out[0]:

We note that the Coordenades column could be expanded into ~76 new columns, if we consider sep="," as a separator. From looking at this data, it's probably in the WKT format (Well-Known Text representation of geometry) and it represents a line, according to the LINESTRING representation.

We'll need to convert each row to match this WKT format, using string manipulation. But we don't want to reinvent the wheel, so we will be using a python package called Shapely developed specifically for these types of tasks.

In [0]:

!pip install shapely

Requirement already satisfied: shapely in /usr/local/lib/python3.6/dist-packages (1.6.4.post2)

In [0]:

sample = streets["Coordenades"].str.split(",")[0]
sample

Out[0]:

['2.11203535639414',
 '41.3841912394771',
 '2.101502862881051',
 '41.3816307921222']

Reshaping the strings¶

A LINESTRING is defined in terms of pairs of values. We need to take a list of length $n$, (and assume this number is even), and reshape it as a list of tuples.

[a, b, c, d, e, f, g] -> [(a, b), (c, d), (f, g),]

In [0]:

import typing as T

from itertools import zip_longest

# not tested
def reshape_line(line: T.List[str], chunksize: int, fillvalue: int) -> T.Iterable[T.Tuple[float]]:
    """Reshape a line to match the LineString WKT format
    
    This is based on `zip_longest`, read more in 
    <https://docs.python.org/3/library/itertools.html#itertools.zip_longest> and
    in the StackOverflow solution posted in <https://stackoverflow.com/a/434411/5819113>
    
    
        zip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
       
    """
    args = [iter(float(el) for el in line)] * chunksize
    return zip_longest(*args, fillvalue=fillvalue)

In [0]:

list(reshape_line(sample, chunksize=2, fillvalue=0))

Out[0]:

[(2.11203535639414, 41.3841912394771), (2.101502862881051, 41.3816307921222)]

With this, we can get our very first drawing. Baby steps.

In [0]:

from shapely.geometry import LineString

sample_wkt = LineString(reshape_line(sample, 2, 0))
sample_wkt

Out[0]:

The next step is to apply this process to every row on the column Coordenades. We will implement a helper function for this purpose.

In [0]:

def convert_line_to_wkt(line: str) -> LineString:
    splitted_line = line.split(",")
    reshaped_line = reshape_line(splitted_line, chunksize=2, fillvalue=None)
    return LineString(reshaped_line)

In [0]:

# let's process the whole column
coords_wkt = streets["Coordenades"].apply(lambda line: convert_line_to_wkt(line))

# access the __str__ of the first element
print(coords_wkt[1])

LINESTRING (2.111944376806616 41.38446666680338, 2.101594089443895 41.38186790291103)

With the values in WKT format, we could use an online WKT viewer like http://arthur-e.github.io/Wicket/sandbox-gmaps3.html to see how this is rendered on a map.

First plot using `geopandas`¶

The former was the first approach. We will try now our next step: plot everything at once. Whatever gets displayed will be considered success.

In [0]:

!pip install geopandas

Collecting geopandas
  Downloading https://files.pythonhosted.org/packages/21/80/da2a33c9201cd4ce693f4aa6189efc9ef1a48bec1c3b02c3ce9908b07fec/geopandas-0.5.1-py2.py3-none-any.whl (893kB)
     |████████████████████████████████| 901kB 5.1MB/s 
Requirement already satisfied: pandas in /usr/local/lib/python3.6/dist-packages (from geopandas) (0.24.2)
Requirement already satisfied: shapely in /usr/local/lib/python3.6/dist-packages (from geopandas) (1.6.4.post2)
Collecting fiona (from geopandas)
  Downloading https://files.pythonhosted.org/packages/89/4a/193cd6a75e51062c85f4e1cd6f312b3bbda6e26ba7510f152ef5016f0b16/Fiona-1.8.6-cp36-cp36m-manylinux1_x86_64.whl (17.9MB)
     |████████████████████████████████| 17.9MB 41.2MB/s 
Collecting pyproj (from geopandas)
  Downloading https://files.pythonhosted.org/packages/16/59/43869adef45ce4f1cf7d5c3aef1ea5d65d449050abdda5de7a2465c5729d/pyproj-2.2.1-cp36-cp36m-manylinux1_x86_64.whl (11.2MB)
     |████████████████████████████████| 11.2MB 22.2MB/s 
Requirement already satisfied: pytz>=2011k in /usr/local/lib/python3.6/dist-packages (from pandas->geopandas) (2018.9)
Requirement already satisfied: numpy>=1.12.0 in /usr/local/lib/python3.6/dist-packages (from pandas->geopandas) (1.16.4)
Requirement already satisfied: python-dateutil>=2.5.0 in /usr/local/lib/python3.6/dist-packages (from pandas->geopandas) (2.5.3)
Requirement already satisfied: click<8,>=4.0 in /usr/local/lib/python3.6/dist-packages (from fiona->geopandas) (7.0)
Collecting click-plugins>=1.0 (from fiona->geopandas)
  Downloading https://files.pythonhosted.org/packages/e9/da/824b92d9942f4e472702488857914bdd50f73021efea15b4cad9aca8ecef/click_plugins-1.1.1-py2.py3-none-any.whl
Requirement already satisfied: attrs>=17 in /usr/local/lib/python3.6/dist-packages (from fiona->geopandas) (19.1.0)
Collecting munch (from fiona->geopandas)
  Downloading https://files.pythonhosted.org/packages/68/f4/260ec98ea840757a0da09e0ed8135333d59b8dfebe9752a365b04857660a/munch-2.3.2.tar.gz
Requirement already satisfied: six>=1.7 in /usr/local/lib/python3.6/dist-packages (from fiona->geopandas) (1.12.0)
Collecting cligj>=0.5 (from fiona->geopandas)
  Downloading https://files.pythonhosted.org/packages/e4/be/30a58b4b0733850280d01f8bd132591b4668ed5c7046761098d665ac2174/cligj-0.5.0-py3-none-any.whl
Building wheels for collected packages: munch
  Building wheel for munch (setup.py) ... done
  Created wheel for munch: filename=munch-2.3.2-py2.py3-none-any.whl size=6613 sha256=5157e6f60cca69d1d04d0bb4cc14e5d53b0702ddfe97d8af306c747f4137671d
  Stored in directory: /root/.cache/pip/wheels/db/bf/bc/06a3e1bfe0ab27d2e720ceb3cff3159398d92644c0cec2c125
Successfully built munch
Installing collected packages: click-plugins, munch, cligj, fiona, pyproj, geopandas
Successfully installed click-plugins-1.1.1 cligj-0.5.0 fiona-1.8.6 geopandas-0.5.1 munch-2.3.2 pyproj-2.2.1

In [0]:

from geopandas import GeoDataFrame

crs = {'init': 'epsg:4326'}
gdf = GeoDataFrame(streets, crs=crs, geometry=coords_wkt)
ax = gdf.plot(cmap="viridis")

Note that here, the colormap viridis is not supposed to make sense. It is not giving any useful information out of the box. But we'll get to that in a minute.

Tweaking the plot¶

Hooray! We got a map. But it's ugly and it doesn't say much.

At this point we could start tweaking the plot using the matplotlib API. We could add legends, meaningful colours, different line styles, etc. You get it. And if you have played with matplotlib long enough, you'll know that almost everything is possible, but not easy to get.

In our case we don't need to produce beautiful plots at this point, we need to provide an efficient way to explore the data visually.

For maps, the best tool to do this IMHO is to use leaflet, but we don't want to write javascript (I don't want to). We will be using Folium, which is a library for using leaflet in python.

Plotting routes on top of Barcelona using `Folium`¶

In [0]:

!pip install folium

Requirement already satisfied: folium in /usr/local/lib/python3.6/dist-packages (0.8.3)
Requirement already satisfied: numpy in /usr/local/lib/python3.6/dist-packages (from folium) (1.16.4)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.6/dist-packages (from folium) (2.10.1)
Requirement already satisfied: six in /usr/local/lib/python3.6/dist-packages (from folium) (1.12.0)
Requirement already satisfied: branca>=0.3.0 in /usr/local/lib/python3.6/dist-packages (from folium) (0.3.1)
Requirement already satisfied: requests in /usr/local/lib/python3.6/dist-packages (from folium) (2.21.0)
Requirement already satisfied: MarkupSafe>=0.23 in /usr/local/lib/python3.6/dist-packages (from jinja2->folium) (1.1.1)
Requirement already satisfied: urllib3<1.25,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from requests->folium) (1.24.3)
Requirement already satisfied: idna<2.9,>=2.5 in /usr/local/lib/python3.6/dist-packages (from requests->folium) (2.8)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.6/dist-packages (from requests->folium) (2019.6.16)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests->folium) (3.0.4)

In [0]:

import os
import folium

print(folium.__version__)

0.8.3

In [0]:

barcelona_map = folium.Map([41.3947,2.1557], zoom_start=12.4, tiles='cartodbpositron')
folium.GeoJson(gdf).add_to(barcelona_map)
# plot map
barcelona_map

Out[0]:

Plotting traffic data¶

This map is plain and boring, because it doesn't contain any information from the traffic dataset. Let's fix that.

We would like to achieve the following:

To draw the lines from Coord_wkt on top of a map of Barcelona
To plot the values from the trf dataframe, showing how busy is the line at a certain moment, defined by the column data

We note that the traffic data has a temporal component in data, which will benefit greatly from using the correct datatype, i.e. we will try to use the DateTime type over the Object type.

In [0]:

trf["data"].head()

Out[0]:

36     20190501000552
56     20190501000552
57     20190501000552
143    20190501000552
276    20190501000552
Name: data, dtype: int64

We can convert this column to datetime (https://pandas.pydata.org/pandas-docs/version/0.20/generated/pandas.to_datetime.html) following the strftime format specifications:

In [0]:

# https://pandas.pydata.org/pandas-docs/version/0.20/generated/pandas.to_datetime.html
pd.to_datetime(trf["data"].astype(str)[:5], format='%Y%m%d%H%M%S')

Out[0]:

36    2019-05-01 00:05:52
56    2019-05-01 00:05:52
57    2019-05-01 00:05:52
143   2019-05-01 00:05:52
276   2019-05-01 00:05:52
Name: data, dtype: datetime64[ns]

In [0]:

trf["data_datetime"] = pd.to_datetime(trf["data"].astype(str), format='%Y%m%d%H%M%S')

In [0]:

trf.columns

Out[0]:

Index(['idTram', 'data', 'estatActual', 'estatPrevist', 'data_datetime'], dtype='object')

In [0]:

trf.head(5)

Out[0]:

	idTram	data	estatActual	estatPrevist	data_datetime
36	37	20190501000552	4	5	2019-05-01 00:05:52
56	57	20190501000552	1	5	2019-05-01 00:05:52
57	58	20190501000552	1	5	2019-05-01 00:05:52
143	144	20190501000552	1	5	2019-05-01 00:05:52
276	277	20190501000552	2	5	2019-05-01 00:05:52

Quick survey on the data¶

How busy are trams compared to the expected traffic, on average, for the month of May?¶

As an extra, we'll try to assess the forecasts: were they right? We'll consider a simple rule of thumb

If the forecasted traffic is greater than the current traffic, then the forecast is right

This is one approach, and I agree it is rough. There are multiple ways of asessing the error in predictions, but here's not the place to discuss them.

In [0]:

# caveat: this is an *aggregated* measure of the whole month, but it provides
# a rough measure

monthly_traffic = trf.groupby(by=['idTram'], as_index=True).mean()
monthly_traffic["diff_perc"] = monthly_traffic["estatActual"]/monthly_traffic["estatPrevist"]*100
monthly_traffic["was_forecast_right"] = monthly_traffic["diff_perc"] <= 100
monthly_traffic.head()

Out[0]:

	data	estatActual	estatPrevist	diff_perc	was_forecast_right
idTram
1	2.019052e+13	2.127508	2.125576	100.090890	False
2	2.019052e+13	1.809214	1.874505	96.516888	True
3	2.019052e+13	1.985931	2.011511	98.728342	True
4	2.019052e+13	2.262626	2.305628	98.134936	True
5	2.019052e+13	2.744457	2.737290	100.261860	False

How busy were trams on average, on an hour basis, per tram line, during the month of May?¶

In [0]:

# line continuation according to PEP8 
# see https://stackoverflow.com/a/17630918/5819113

trams_hourly = (
    # drop unwanted columns
    trf.drop(columns=['data', 'idTram'])
        # group by desired columns
        .groupby(by=[trf['idTram'], trf["data_datetime"].dt.day], as_index=True)
        # show the mean of the aggregated values
        .mean()
)

trams_hourly["diff_perc"] = trams_hourly["estatActual"]/trams_hourly["estatPrevist"]*100
trams_hourly["was_forecast_right"] = trams_hourly["diff_perc"] <= 100
trams_hourly.head(24) # first 24 hours

Out[0]:

		estatActual	estatPrevist	diff_perc	was_forecast_right
idTram	data_datetime
1	1	1.598131	1.696262	94.214876	True
	2	2.178082	2.091324	104.148472	False
	3	2.452915	2.313901	106.007752	False
	4	1.771300	1.793722	98.750000	True
	5	1.648649	1.675676	98.387097	True
	6	2.256881	2.215596	101.863354	False
	7	2.261538	2.230769	101.379310	False
	8	2.278302	2.264151	100.625000	False
	9	2.255814	2.241860	100.622407	False
	10	2.451613	2.479263	98.884758	True
	11	1.724138	1.719828	100.250627	False
	12	1.654867	1.681416	98.421053	True
	13	2.253333	2.235556	100.795229	False
	14	2.197183	2.169014	101.298701	False
	15	2.459016	2.420765	101.580135	False
	16	2.386935	2.351759	101.495726	False
	17	2.384615	2.380090	100.190114	False
	18	1.758772	1.771930	99.257426	True
	19	1.583710	1.628959	97.222222	True
	20	2.326829	2.321951	100.210084	False
	21	2.160194	2.223301	97.161572	True
	22	2.179487	2.215385	98.379630	True
	23	2.385650	2.426009	98.336414	True
	24	2.563319	2.502183	102.443281	False

Baking everything together¶

Let's merge dataframes and see what we get. This is my first time using these libraries, so my code has a lot of comments that hopefully explain what I'm doing. I've taken a lot from the examples given in the folium repo, so please go there to see what else can be done!

In [0]:

# join dataframes, to *see* what happens. But make a copy first.
_trf_to_merge = trf.copy()
_streets_to_merge = streets.copy()

In [0]:

# merge both dataframes using the id of the tram as common column
_merged = _trf_to_merge.merge(_streets_to_merge, left_on="idTram", right_on="Tram")
# drop unnecesary columns and set the idTram
_merged = _merged.drop(columns=["data", "Tram", "Coordenades"])
_merged.head()

Out[0]:

	idTram	estatActual	estatPrevist	data_datetime	Descripció	geometry
0	37	4	5	2019-05-01 00:05:52	Creu Coberta - Sants (Gran Via Carles III a Ri...	LINESTRING (2.121871003054767 41.3757077900903...
1	37	3	5	2019-05-01 00:00:53	Creu Coberta - Sants (Gran Via Carles III a Ri...	LINESTRING (2.121871003054767 41.3757077900903...
2	37	3	5	2019-05-01 00:10:53	Creu Coberta - Sants (Gran Via Carles III a Ri...	LINESTRING (2.121871003054767 41.3757077900903...
3	37	2	3	2019-05-01 00:15:52	Creu Coberta - Sants (Gran Via Carles III a Ri...	LINESTRING (2.121871003054767 41.3757077900903...
4	37	3	3	2019-05-01 00:20:52	Creu Coberta - Sants (Gran Via Carles III a Ri...	LINESTRING (2.121871003054767 41.3757077900903...

In [0]:

# `estatActual` and `estatPrevist` vary from 1 to 5
_merged.describe()

Out[0]:

	idTram	estatActual	estatPrevist
count	2.125642e+06	2.125642e+06	2.125642e+06
mean	2.520718e+02	1.896606e+00	1.925726e+00
std	1.606364e+02	8.496675e-01	9.009781e-01
min	1.000000e+00	1.000000e+00	1.000000e+00
25%	1.170000e+02	1.000000e+00	1.000000e+00
50%	2.240000e+02	2.000000e+00	2.000000e+00
75%	4.080000e+02	2.000000e+00	2.000000e+00
max	5.340000e+02	5.000000e+00	5.000000e+00

In [0]:

_merged.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2125642 entries, 0 to 2125641
Data columns (total 6 columns):
idTram           int64
estatActual      int64
estatPrevist     int64
data_datetime    datetime64[ns]
Descripció       object
geometry         object
dtypes: datetime64[ns](1), int64(3), object(2)
memory usage: 113.5+ MB

Plotting combined data¶

Grouping information by day¶

In [0]:

from geopandas import GeoDataFrame

crs = {'init': 'epsg:4326'}
merged_gdf = GeoDataFrame(_merged, crs=crs, geometry=_merged['geometry'])

In [0]:

# create a colormap using branca
# following the examples from
# https://nbviewer.jupyter.org/github/python-visualization/folium/blob/master/examples/plugin-Search.ipynb

import branca

colorscale = branca.colormap.linear.viridis.scale(1, 5)
colorscale.caption="Tram usage level"


def style_function(feature):
    estat_actual = int(feature['properties']['estatActual']) or None
    return {
        'alpha': 0.5,
        'weight': 3,
        'color': '#black' if estat_actual is None else colorscale(estat_actual)
    }

In [0]:

# we will manipulate the dataframe to plot data using time scales, 
# since plotting the whole dataframe it's too expensive memory-wise

first_10_am = merged_gdf[(merged_gdf["data_datetime"].dt.day == 1) & (merged_gdf["data_datetime"].dt.hour == 10)]

In [0]:

# we need to drop this column because it raises an error
first_10_am = first_10_am.drop(columns="data_datetime", inplace=False)

In [0]:

# plot the map
barca_map_colors = folium.Map([41.3947,2.1557], zoom_start=12.4, tiles='cartodbpositron')

(
    folium.GeoJson(
        first_10_am,
        style_function=style_function)
    .add_to(barca_map_colors)
)

# add legend, see <https://github.com/python-visualization/folium/issues/528>
barca_map_colors.add_child(colorscale)

# plot map
barca_map_colors

Out[0]:

In [0]:

# we will manipulate the dataframe to plot data using time scales

gdf_hourly = (
        # group by desired columns
        merged_gdf.groupby(
            by=[merged_gdf['idTram'],
                merged_gdf["data_datetime"].dt.day, # here is where the grouping happens
               ],
            as_index=True)
        # show the mean of the aggregated values
        .mean()
        .reset_index()
)

gdf_hourly["diff_perc"] = gdf_hourly["estatActual"]/gdf_hourly["estatPrevist"]*100
gdf_hourly["was_forecast_right"] = gdf_hourly["diff_perc"] <= 100
gdf_hourly_w_geometry = gdf_hourly.merge(streets, left_on="idTram", right_on="Tram")
gdf_hourly_w_geometry.tail()

Out[0]:

	idTram	data_datetime	estatActual	estatPrevist	diff_perc	was_forecast_right	Tram	Descripció	Coordenades	geometry
10281	534	27	1.883392	1.886926	99.812734	True	534	Ronda Litoral (Nus de la Trinitat a Potosí)	2.206062333803802,41.44356616311283,2.20196194...	LINESTRING (2.206062333803802 41.4435661631128...
10282	534	28	1.981413	2.144981	92.374350	True	534	Ronda Litoral (Nus de la Trinitat a Potosí)	2.206062333803802,41.44356616311283,2.20196194...	LINESTRING (2.206062333803802 41.4435661631128...
10283	534	29	1.929577	1.978873	97.508897	True	534	Ronda Litoral (Nus de la Trinitat a Potosí)	2.206062333803802,41.44356616311283,2.20196194...	LINESTRING (2.206062333803802 41.4435661631128...
10284	534	30	1.851064	1.918440	96.487985	True	534	Ronda Litoral (Nus de la Trinitat a Potosí)	2.206062333803802,41.44356616311283,2.20196194...	LINESTRING (2.206062333803802 41.4435661631128...
10285	534	31	2.017544	2.056140	98.122867	True	534	Ronda Litoral (Nus de la Trinitat a Potosí)	2.206062333803802,41.44356616311283,2.20196194...	LINESTRING (2.206062333803802 41.4435661631128...

Plot daily average values¶

We are using a discrete colorscale that ranges from 1 to 6, to indicate the mean of the estatActual for each Tram line on May 27 (The color selection is arbitrary)
We are drawing the lines whose capacity forecast was right (e.g. the actual capacity was below the forecasted value) with a dashed line, and those lines whose value was wrong (e.g. the capacity forecast fell short on that day) with a continuous line. This is because of the limited styles that leaflet provides out of the box, and any work on improving this style had to be done with CSS, which is out of the scope of this challenge.

In [0]:

import branca
from folium.plugins import Search
from geopandas import GeoDataFrame

# produce the geopandas dataframe
crs = {'init': 'epsg:4326'}
gdf_hourly = GeoDataFrame(
    gdf_hourly_w_geometry[gdf_hourly_w_geometry['data_datetime']==27], # here is where the filtering happens
    crs=crs,
    geometry=gdf_hourly_w_geometry['geometry']
)

# create colorscale
colorscale = branca.colormap.step.Accent_06.scale(1, 5)
colorscale.caption="estatActual (Daily average)"

# define a method for styling lines
def style_function_capacity(feature):
    not_exceeded = int(feature['properties']['was_forecast_right']) or None
    estat_actual = int(feature['properties']['estatActual']) or None
    
    # see for other options https://leafletjs.com/reference-1.5.0.html#path-option
    return {
        'alpha': 0.5,
        'dashArray': '5,10' if not_exceeded else '0', # dashed line unless forecast is wrong
        'weight': 3,
        'color': '#black' if estat_actual is None else colorscale(estat_actual),
    }
  
# plot the map of barcelona
hourly_map = folium.Map([41.3947,2.1557], zoom_start=12.4, tiles='cartodbpositron')

# plot the tram lines
tram_map = (
    folium.GeoJson(
        gdf_hourly,
        name="Barcelona Tram Capacity (27 May 2019)",
        style_function=style_function_capacity,
        tooltip=folium.GeoJsonTooltip(
            fields=['idTram', 'estatActual', 'estatPrevist', 'diff_perc', 'Descripció'],
            aliases=['Tram ID', 'Observed Capacity', 'Forecasted Capacity', 'Load (%)', 'Description'], 
            localize=True)
    )
    .add_to(hourly_map)
)

# add legend, see <https://github.com/python-visualization/folium/issues/528>
hourly_map.add_child(colorscale)

# add a searchbox bound to the Description of lines
statesearch = Search(
    layer=tram_map,
    geom_type='Line',
    placeholder='Search for a Tram Description',
    collapsed=False,
    search_label='Descripció',
    weight=3
).add_to(hourly_map)


# plot map
hourly_map

Out[0]:

Final words¶

Thanks for reading until here! Regarding the data, I've explored manually isolated cases, so my conclusions need a bit of work. I'll refer only to the last map being shown (May 27).

It's not common to observe capacity levels over 3
Traffic loads range from 90 to 110%. Forecasts are really tight but it seems there's no need for accounting for more.
It may be worth exploring plots of diff_perc and other error measurements.

It's easy to extend these maps to any time of the day, using the dt accessor and using the data_datetime as a pivot for indexing.

My intention with this notebook is to show the power of leaflet and python, and the libraries available to manipulate geocoded data like geopandas and shapely. I've found that there is an application that uses this data, check https://com-shi-va.barcelona.cat/en/transit-

Until next time.

Moving rocks around

Plotting traffic maps in Barcelona

Introduction¶

Hands on¶

Manipulating WKT data with shapely¶

Reshaping the strings¶

First plot using `geopandas`¶

Tweaking the plot¶

Plotting routes on top of Barcelona using `Folium`¶

Plotting traffic data¶

Quick survey on the data¶

How busy are trams compared to the expected traffic, on average, for the month of May?¶

How busy were trams on average, on an hour basis, per tram line, during the month of May?¶

Baking everything together¶

Plotting combined data¶

Grouping information by day¶

Plot daily average values¶

Final words¶

Plotting traffic maps in Barcelona

Introduction¶

Hands on¶

Manipulating WKT data with shapely¶

Reshaping the strings¶

First plot using geopandas¶

Tweaking the plot¶

Plotting routes on top of Barcelona using Folium¶

Plotting traffic data¶

Quick survey on the data¶

How busy are trams compared to the expected traffic, on average, for the month of May?¶

How busy were trams on average, on an hour basis, per tram line, during the month of May?¶

Baking everything together¶

Plotting combined data¶

Grouping information by day¶

Plot daily average values¶

Final words¶

First plot using `geopandas`¶

Plotting routes on top of Barcelona using `Folium`¶