County Level Election Mapping
Nov 3, 2020: Northampton County, PA:
THIS ANALYSIS USES UNOFFICIAL RESULTS
Quick Links:
Background
I think everyone has been following the 2020 Presidential election this week. Pennsylvania has been a pivotal state in the election, so much that there is even a satire article about how much Pennsylvania geography people have learned in the last week.
I came across a PDF of the Unofficial Election Results on the County website and really wanted to see it on a map. I started parsing out the PDF data and writing code for it and I decided it would be good to write about it and share it!
Swing County
Northampton County, PA is an interesting county for this type of analysis because it is evenly split between Democratic and Republican voters, making it a “swing county”.
Using the unofficial data from the county for the Presidential race, I found that there were 84,145
votes for the Democratic Candidate (DEM), and 82,830
for the Republican Candidate (REP). There were also 170,048
votes total, so that means 49.5%
of the votes went DEM and 48.7%
went REP. This is what makes Northampton County, PA an important swing county in an important swing state.
The Presidential Candidates were aware of the pivotal role that Northampton County plays in the election and, in the week preceding the election, the county was visited by both the Republican Presidential Candidate and the Democratic Vice Presidential Candidate.
Economy
Northampton County contains two cities: Easton and Bethlehem, as well as a few other urban areas such as: Nazareth, Freemansburg, and Hellertown. In addition to the urban areas, the county also has a diverse economy in food service, manufacturing, warehousing, and healthcare.
It is also notable that many of the county’s residents commute outside the county for work. As of 2005, 53%
of the working population in the county worked in the county and 23%
worked in Lehigh County, which is strongly economically linked to Northampton County. Only 7%
worked in counties in PA other than Lehigh and Northampton, and 13%
worked in New Jersey. This would be a fun dataset to parse and map once the 2020 census is released.
Goals
- Create a map of the Presidential election, and possibly other elections in Northampton County for Nov 3, 2020 using the Unofficial Election Results.
- Create it in a repeatable fashion so that I can run it on the official results when they are released.
- Document it in a Jupyter notebook so the code is easily sharable.
Tools
- Python 3
- urllib
- subprocess
- json
- re
- os
- pdfminer
- StringIO
- pandas
- IPython.core.display
- string
- uuid
- ogr2ogr
- Jupyter Notebook
- Mapbox GL JS
Process
You can follow along in the Jupyter Notebook.
1. Download the Precinct Geospatial Data
I just ran a quick Google Search for northampton county pa gis data precincts and the second link took me right to the county dataset for voting data. That was a lucky break!
I then used the Python subprocess library to run ogr2ogr which can download the data directly from the server, reproject it, and save it as a GeoJSON file.
2. Download the Election Results PDF
I used the urllib library to download this file. I also added some code to check if the file has already been downloaded so I don’t have to re-download the data each time I run the notebook.
3. Convert the PDF to text
The PDFMiner library does a good a job at parsing the text of a PDF file. This step takes a while. I also removed double newlines and replaced them with a single new line, just to make parsing it a little easier.
4. Parse the text output
This was definitely the hardest and most time-consuming part of this process. I originally thought I could get the data out with a Regular Expression, but it became very difficult very quickly.
5. Bring it into a Pandas dataframe
This step probably isn’t needed, but Pandas is a lot of fun to work with and it does make doing statistics really easy, even if we’re just doing some ratios.
As a note, I needed to use .astype(float)
for each new column because they defaulted to int64, which was not supported by the json
library. This prevented me from exporting the data to GeoJSON.
6. Join the GeoJSON data to the dataframe
I’m a big fan of the Unpacking Dictionary operator in Python, and it makes the code for this section really concise.
Basically, it loops through every precinct in the GeoJSON file and pulls the precinct data from the Pandas dataframe then it loads it into the properties for the GeoJSON file.
It’s small enough that I can paste it here:
for feature in precincts_geojson["features"]:
if feature["properties"]["PRECINCTID"] in df.index:
feature["properties"] = {
**feature["properties"],
**dict(df.loc[feature["properties"]["PRECINCTID"]])
}
7. Make the map
This section of code just makes a big string containing the HTML needed to add a mapbox-gl-js object to the Jupyter notebook. It is a template that takes five parameters. The map div ID, the GeoJSON data, the style for the map, the popup code, and the layer name. The style and the popups are explained a little bit more in the Jupyter Notebook.
8. Results
Note: The maps are clickable
Presidential Election
PA Attorney General
PA Representative in Congress 7th Congressional District
9. Conclusion
The maps don’t change very much based on the office being voted for although there are some small differences. It would be interesting to see how people voted for other offices based on their presidential vote, but that’s beyond the scope for this right now
If you are familiar with Northampton County, PA you would recognize the Urban / Rural divide, with the cities of Bethlehem and Easton being predominately blue, and the more rural townships (and some boroughs) being red. You can also see the effect of the small urban area in Nazareth pulling the area more to center, but not quite making it blue.
There are a lot of cartographic changes I’d like to make. It might be good to get the labels on top of the precincts. I would also like to use vector tiles as a basemap (I’m using raster tiles from carto right now).
Thanks for reading!