Measuring Walkability with Openstreetmap and Isochrone Maps

Walkability

Walkability is the quality of a neighborhood that allows pedestrians to accomplish everyday tasks like collecting groceries, visiting restaurants, and attending doctor's appointments, all on foot.

There are many ways to measure walkability, including intersection density and by measuring the distance to urban amenities. The most notable metric for walkability is Walk Score, which is owned by Red Fin. They use a proprietary system for this metric, but ultimately their methodology1 is based on the distance to amenities, similar to the one I will explore today.

Isochrone maps

Isochrone maps are a way to visualize the area that could be reached from a starting point in a given time. These distances appear as contours originating from the starting point, showing bands across which the travel time is equal.

An 1881 isochrone map by Francis Galton showing world travel time starting in London
An 1881 isochrone map by Francis Galton showing world travel time starting in London.

Isochrone maps may account for travel by different methods. E.g. travel by car is constrained to roads, and travel by foot is constrained (roughly) to sidewalks.

Motivation

The main idea of this project is to examine how not just density, but also the structure of roads and sidewalks impact walkability. For instance, compare these two pedestrian isochrone maps showing the destinations possible in 15 minutes of walking:

A rural residential area in West Virginia
A rural residential area in West Virginia. Copyright openstreetmap
A map of Manhattan
Manhattan, New York City. Copyright openstreetmap

Even though there are clearly fewer amenities near this rural area, the network structure of roads also prohibits covering as much distance, not to mention the lack of sidewalks facilitating safe pedestrian travel. The city block structure in Manhattan provides a 30% increase in the walkable area from the starting point, compared to the rural area.

Valhalla

Valhalla Isochrome Map Example
Valhalla Isochrome Map Example. Copyright openstreetmap

To construct isochrone maps, we'll be using a tool called Valhalla, which is an open source routing engine. This tool operates on top of openstreetmap data, which is crowdsourced by volunteers, but is generally quite thorough, especially in urban areas.

Downloading the data

First I downloaded the openstreetmap data from Geofabrik,2 which graciously hosts up-to-date extracts of openstreetmap for download.

Initially I used an extract specifically for Virginia which was a modest size of 342 MB. But after this code was tested, I ran imported the entire United States extract, which is 8.5 GB.

Then I copied the data to data/data.osm.pbf

Setting up Valhalla and PostGIS

I used docker compose to manage Valhalla and PostGIS so that I didn't have to install it locally.

For Valhalla, this docker compose file just exposes port 8002 for the API in addition to some volume mounts for the run script and the OSM data.

version: "3.9"
services:
  valhalla:
    image: valhalla/valhalla:run-latest
    ports:
      - "8002:8002"
    volumes:
      - ./scripts:/scripts
      - ./data:/data
    entrypoint: /scripts/build_tiles.sh

Combined with a simple bash script to build the Valhalla tiles and run the server:

#!/bin/bash

mkdir /data/valhalla_tiles

if ! test -f "/data/valhalla.json"; then
    echo "Valhalla JSON not found. Creating config." 
    valhalla_build_config --mjolnir-tile-dir /data/valhalla_tiles --mjolnir-tile-extract /data/valhalla_tiles.tar --mjolnir-timezone /data/valhalla_tiles/timezones.sqlite --mjolnir-admin /data/valhalla_tiles/admins.sqlite > /data/valhalla.json
fi

if ! test -f "/data/valhalla_tiles/timezones.sqlite"; then
    echo "Valhalla tiles not found. Building now."
    valhalla_build_timezones > /data/valhalla_tiles/timezones.sqlite
    valhalla_build_tiles -c /data/valhalla.json /data/data.osm.pbf
fi

if ! test -f "/data/valhalla_tiles.tar"; then
    echo "Tile extract not found. Extracting now."
    valhalla_build_extract -c /data/valhalla.json -v
fi

echo "Starting valhalla server."
valhalla_service /data/valhalla.json 1

The full code can be found here.

Setting up PostGIS

PostGIS is a geospatial extension on top of PostgreSQL, which does some very cool optimizations to support efficient geospatial queries.

Similarly, I set up PostGIS using my osm2pgsql-docker repo and imported the US dataset with osm2pgsql, a tool used for ingesting the protobuf formatted data into PostGIS.

version: "3.9"  # optional since v1.27.0
services:
  postgis:
    container_name: postgis
    image: postgis/postgis
    ports:
      - "15432:5432"
    environment:
      - POSTGRES_PASSWORD=password
    volumes:
      - type: bind
        source: ./postgresql.conf
        target: /etc/postgresql.conf
    command: ["postgres", "-c", "config_file=/etc/postgresql.conf"]

After PostGIS was running, I imported the data. This was initially tested on the Virginia state osm.pbf data, but I later updated this to use the entire US dataset. After making this change, the script kept running out of memory. So I added the --slim --drop flags, which process the data using PostgresSQL (using my SSD), rather than storing everything in memory. Although I was running this on a 16 GB machine, I did not have enough free memory to process for the whole file.

PGPASSWORD=password osm2pgsql \
  --create \
  --verbose \
  -P 15432 -U postgres -H localhost \
  -S osm2pgsql.style \
  --slim --drop \
  /home/wcedmisten/Downloads/us-latest.osm.pbf

I also used a simplified version of osm2pgsql.style to reduce the number of features imported into the database.

These 3 lines were retained, but most other tags were set to be deleted, rather than imported.

node,way   amenity      text         polygon
node,way   shop         text         polygon
node,way   leisure      text         polygon

I intentionally limited my analysis to amenity, shop, and leisure tags.

Due to using my SSD instead of memory for the import, the whole process took around 6 hours. I just let it run overnight, so this wasn't a huge issue.

Even with this reduced set of tags, the database still took up 25 GB of storage, much more than could fit in memory.

Writing the Code

Now that the backends were set up, I could begin writing the code to measure walkability. My code was intended to do several things for each city we want to measure:

  • Sample some points randomly inside a city
  • Find the isochrone map for each of these points within a 15 minute walk
  • Find the amenities within this isochrone boundary (e.g. restaurants, parks, grocery stores, etc.)

Getting the City Boundary Data

To find the boundaries for US cities, I downloaded a few different datasets for cities:

  • geoJSON files for each city's boundary for major cities in the US. These were based on the TIGER US census dataset that someone converted to GeoJSON
  • top 1,000 most populous cities. I examined the top 100 cities in this dataset to compare walkability.

I then created a PostGIS table, city, to store the GeoJSON boundaries for these 100 cities using a python script:

upload_cities.py:

import psycopg2
import json

from util import get_cities

conn = psycopg2.connect(host="localhost", port="15432", dbname="postgres", user="postgres", password="password")
cur = conn.cursor()

create_city_table = """CREATE TABLE IF NOT EXISTS city (
    name text,
    state text,
	geom geometry,
    PRIMARY KEY (name, state)
);
"""

cur.execute(create_city_table)
conn.commit()


def import_city(city, state):
    with open(f"geojson-us-city-boundaries/cities/{state}/{city}.json") as f:
        data = json.load(f)

    geom = data['features'][0]['geometry']

    cur.execute("INSERT INTO city (name, state, geom) VALUES (%s, %s, %s)", (city, state, json.dumps(geom)))
    conn.commit()


# get_cities() just munged the city population JSON data to get the top 100
for city, state in get_cities():
    # print(city,state)
    import_city(city, state)

This allowed me to query whether or not a point is inside a city with the SQL query:

SELECT ST_Contains(geom, ST_GeomFromText('POINT(%s %s)'))
    FROM ( SELECT geom FROM city WHERE name=%s AND state=%s ) as geom

Sampling

I wanted to generate an even distribution of points across a city and sample "walkability" at each of these points. Then, all samples could be averaged out.

To accomplish this, I calculated the bounding box for each city, and generated a Poisson disk sampling3 of points. This distribution guaranteed minimal overlap of points being sampled.

This bounding-box distribution for New York City looked like this:

sampling points for bounding box of NYC - map data copyright openstreetmap

After removing the points outside of the city's actual boundaries, the sample looked like this:

sampling points for NYC

Zoomed in further:

sampling points for NYC - zoomed in

Poisson disk distributions guarantee a minimum separation distance for all the points. I picked a distance somewhat arbitrarily that would have some overlap of walkable amenities for neighboring samples, but not too much.

The code to do this was also fairly simple, relying on a library poisson_disc 4 to calculate the sample points, and PostGIS to determine whether each point was in the city.

get_bbox_query = """SELECT ST_AsText(ST_Envelope( geom ))
FROM ( SELECT geom FROM city WHERE name=%s AND state=%s ) as geom
"""

cur.execute(get_bbox_query, [city, state])
bbox = cur.fetchone()

minx, miny, maxx, maxy = shapely.wkt.loads(bbox[0]).bounds
height = (maxy - miny)
width = (maxx - minx)

points = poisson_disc.Bridson_sampling(dims=np.array([width,height]), radius=.005)

is_in_query = """
SELECT ST_Contains(geom, ST_GeomFromText('POINT(%s %s)'))
FROM ( SELECT geom FROM city WHERE name=%s AND state=%s ) as geom
"""

scaled = list(map(lambda p: [p[0] + minx, p[1] + miny], points))

def point_in_city(point):
    is_in_city = cur.execute(is_in_query, [point[0], point[1], city, state])
    return cur.fetchone()[0]

# get all the points in the given city
in_city = list(filter(point_in_city, scaled))

Creating a metric

Now that I had a sampling for the city, I needed a metric to calculate for each point.

I used a relatively simple approach: I examined a few categories of urban amenities and gave 1 point to each unique category of amenity within a 15 minute walking distance. The walking distance was determined by querying Valhalla for each sampled point.

For instance, a point with 3 restaurants, 2 bakeries, and 1 shoe store within walking distance would get a score of 3: 1 point for each category. I reasoned that most amenities are essentially a binary category. Either the amenity you want is within walking distance or it's not. For future work, I would probably change this to a logarithmic or logistic function to reflect diminishing returns on amenities of the same category.

The first bakery within walking distance provides much more value than the second one, which provides more value than the third, etc.

It was also simpler to examine these categories at a binary level because the OSM data model has many places where a single entity would be represented as multiple nodes (i.e. a building polygon), and this approach avoided that complexity.

15 minutes was chosen somewhat arbitrarily, but is about the maximum I would consider to be a "quick walk" to a destination. In the future, this could be explored using variable distances per amenity.

I created a function get_amenities(lat, lon) to retrieve the amenities within walking distance of a point.

MAX_WALK_TIME = 15

def get_amenities(lat, lon):
    payload = {
        "locations":[
            {"lat": lat,"lon": lon}
        ],
        "costing":"pedestrian",
        "denoise":"0.11",
        "generalize":"0",
        "contours":[{"time":MAX_WALK_TIME}],
        "polygons":True
    }


    request = f"http://localhost:8002/isochrone?json={json.dumps(payload)}"
    isochrone = requests.get(request).json()

    geom = json.dumps(isochrone['features'][0]['geometry'])

    query = f"""
    SELECT * FROM planet_osm_point
    WHERE ST_Contains(ST_Transform(ST_GeomFromGeoJSON('{geom}'),3857), way) AND (amenity IS NOT NULL OR shop IS NOT NULL OR leisure IS NOT NULL);
    """

    cur.execute(query)
    rows = cur.fetchall()

    cur.execute("SELECT * FROM planet_osm_point LIMIT 0")
    colnames = [desc[0] for desc in cur.description]
    
    return to_dict(colnames, rows)

This function:

  1. makes a request to the Valhalla Isochrone service to get the polygon representing a 15 minute walk surrounding the point
  2. uses that polygon to find all unique amenities within those bounds

The amenities considered here are represented in the following OSM tags, including their definitions from the OSM wiki.

  • amenity:cafe - a generally informal place with sit-down facilities selling beverages and light meals and/or snacks
  • amenity:bar - a purpose-built commercial establishment that sells alcoholic drinks to be consumed on the premises
  • amenity:pub - an establishment that sells alcoholic drinks that can be consumed on the premises
  • amenity:restaurant - formal eating places with sit-down facilities selling full meals served by waiters
  • shop:bakery - a shop selling bread
  • shop:convenience - a small local shop carrying a variety of everyday products, mostly including single-serving food items
  • shop:supermarket - a large shop for groceries and other goods, including meat and fresh produce
  • shop:department_store - a large store with multiple clothing and other general merchandise departments
  • shop:clothes - a shop which primarily sells clothing, such as underwear, shirts, jeans etc
  • shop:shoes - is a type of retailer that specialises in selling shoes
  • shop:books - A shop selling books (including antique books, chain bookstores, etc.)
  • shop:hairdresser - where you can get your hair cut
  • leisure:fitness_center - a place with exercise machines and/or fitness/dance classes. They are places to go to for exercise
  • leisure:park - an area of open space for recreational use, usually designed and in semi-natural state with grassy areas, trees and bushes

Results

Top 10 most walkable cities

RankCityAmenity Score
1washington-dc dc5.54
2minneapolis mn4.31
3chicago il4.2
4new-york ny4.07
5seattle wa4.01
6cleveland oh3.84
7denver co3.37
8jersey-city nj3.31
9st-louis mo3.14
10pittsburgh pa3.07

Least 10 walkable cities:

RankCityAmenity Score
91louisville-jefferson ky0.39
92tucson az0.38
93north-las-vegas nv0.36
94jacksonville fl0.35
95lubbock tx0.32
96oklahoma-city ok0.31
97chesapeake va0.27
98bakersfield ca0.22
99corpus-christi tx0.17
100anchorage ak0.05

I also compared these rankings to Walk Score, and found that they seemed to be mostly in the ballpark.

walk score comparison

This plot shows the difference between the Walk Score ranking and my ranking. E.g. I ranked San Fransisco as #26, but Walk Score ranked it at #1, showing a bar at -25 for the first entry. The x axis is ordered by the walk score ranking.5

Limitations

Some significant limitations:

  • I didn't filter samples that aren't on land
  • I didn't filter for residential areas (although arguably this could be intentional to count tourism and travel)
  • openstreetmap is made by crowdsourced data from volunteers, and amenity data may not be evenly distributed (probably benefitting higher population cities)
  • Biases from the boundaries of the city vs. metropolitan sprawl. This would be biased toward larger metropolitan areas

Performance

I profiled running this on a single city with flameprof6, and found that the majority of time is spent between querying Valhalla and querying PostGIS, which makes sense to me. Both Valhalla and PostGIS are most likely heavily optimized already, so the only way I could shave down runtime would be to sample fewer points.

flame graph

Future Work

Custom walkability scores

Explore customized walkability scores. "Walkability" based on all amenities is painting with a broad brush. Most people probably don't care about every single amenity that can be categorized. Peter Johnson has an interesting article7 about reverse engineering the Walk Score, which found that it's highly correlated with the number of nearby restaurants.

Personally, I love restaurants, but I cook most of my meals at home, so this score may be less useful to me. I like the idea of assigning weights to each amenity to get a more custom optimization. You could even optimize for something like "most walkable Chinese restaurants" or "most parts".

We could also codify some other factors with walkability related to the isochrone calculations. For example, people with disabilities may have an especially hard time with poor sidewalk infrastructure. I don't think Valhalla supports custom sidewalk cost penalties related to width, sidewalk condition, or curb cuts, but I would be interested in exploring that more specifically.

One of the coolest things about openstreetmap is the level of detail available in the data. I don't know of anywhere else where curb cut data is available.

Amenity centered isochrones

Another idea I had was instead of sampling a bunch of points, which requires a lot of isochrone calculations, instead calculate isochrones starting from each amenity. Then we could combine all the isochrone polygons and calculate what portion of the city is within X minute walk of that amenity. This could also be useful for finding amenity "deserts", like food deserts, but more general.

This would also allow me to explore varying the walking distance for each amenity category. E.g. I would be more willing to walk further for infrequent and important errands, like electronics repair or a hospital visit. But for more common tasks like getting groceries, I would prefer a closer walk.

Appendix

Source Code:

https://github.com/wcedmisten/city-walkability

https://github.com/wcedmisten/osm2pgsql-docker

Final Rankings

RankCityAmenity Score
1washington-dc dc5.54
2minneapolis mn4.31
3chicago il4.2
4new-york ny4.07
5seattle wa4.01
6cleveland oh3.84
7denver co3.37
8jersey-city nj3.31
9st-louis mo3.14
10pittsburgh pa3.07
11philadelphia pa3.0
12portland or2.94
13st-paul mn2.87
14san-jose ca2.8
15buffalo ny2.67
16baltimore md2.63
17detroit mi2.57
18boston ma2.55
19oakland ca2.54
20miami fl2.44
21sacramento ca2.4
22omaha ne2.33
23chandler az2.26
24long-beach ca2.11
25santa-ana ca2.04
26madison wi2.01
27san-francisco ca2.0
28milwaukee wi1.91
29los-angeles ca1.9
30richmond va1.89
31cincinnati oh1.89
32san-diego ca1.83
33chula-vista ca1.8
34atlanta ga1.8
35gilbert az1.76
36honolulu hi1.76
37anaheim ca1.76
38mesa az1.74
39raleigh nc1.64
40newark nj1.64
41lincoln ne1.62
42albuquerque nm1.51
43las-vegas nv1.49
44austin tx1.43
45columbus oh1.4
46plano tx1.34
47colorado-springs co1.34
48aurora co1.33
49st.-petersburg fl1.3
50greensboro nc1.29
51riverside ca1.28
52houston tx1.25
53boise-city id1.24
54tampa fl1.24
55glendale az1.19
56fremont ca1.1
57toledo oh1.07
58irvine ca1.04
59stockton ca1.03
60orlando fl1.01
61charlotte nc1.0
62hialeah fl0.99
63indianapolis-city in0.99
64durham nc0.96
65norfolk va0.96
66phoenix az0.9
67baton-rouge la0.88
68dallas tx0.84
69reno nv0.83
70fort-wayne in0.82
71arlington tx0.8
72wichita ks0.77
73garland tx0.75
74irving tx0.75
75winston-salem nc0.71
76tulsa ok0.69
77fresno ca0.67
78kansas-city mo0.67
79new-orleans la0.66
80fort-worth tx0.65
81san-antonio tx0.58
82henderson nv0.58
83san-bernardino ca0.56
84nashville-davidson-metropolitan-government tn0.55
85memphis tn0.54
86scottsdale az0.53
87lexington-fayette ky0.51
88el-paso tx0.48
89laredo tx0.44
90virginia-beach va0.4
91louisville-jefferson ky0.39
92tucson az0.38
93north-las-vegas nv0.36
94jacksonville fl0.35
95lubbock tx0.32
96oklahoma-city ok0.31
97chesapeake va0.27
98bakersfield ca0.22
99corpus-christi tx0.17
100anchorage ak0.05

Footnotes

  1. https://www.walkscore.com/methodology.shtml

  2. https://www.geofabrik.de/data/download.html

  3. PDF - Fast Poisson Disk Sampling in Arbitrary Dimensions

  4. https://pypi.org/project/poisson-disc

  5. https://www.walkscore.com/cities-and-neighborhoods

  6. https://pypi.org/project/flameprof

  7. Reverse Engineering the Walk Score Algorithm