## Madison Feb 2019 primary: quick and dirty clustering

Madison had a primary election last night for mayor and for several seats on the City Council and School Board. Turnout was high, as it seems to always be in Dane County lately. The Dane County Clerk has all the results in handy csv form, so you can just download things and start having some fun! There were four major candidates for mayor, so each ward in the city can be mapped to a point in R^4 by the vote share it gave to each of those; except of course this is really R^3 because the vote shares sum to 1. It’s easier to see R^2 than R^3 so you can use PCA to project yourself down to a nice map of wards:

This works pretty well! The main axis of variation (horizontal here) is Soglin vote, which is higher on the left and lower on the right; this vector is negatively weighted on Rhodes-Conway and Shukla but doesn’t pay much attention to Cheeks. The vertical axis mostly ignores Shukla and represents Cheeks taking votes from Rhodes-Conway at the top, and losing votes to Rhodes-Conway at the bottom. You can see a nice cluster of Isthmus and Near West wards in the lower right; Rhodes-Conway did really well there. 57 and 48 are off by themselves in the upper right corner; those are student wards, distinguished in the vote count by Grumpy Old Incumbent Paul Soglin getting next to no votes. And I mean “next to no” in the literal sense; he got one vote in each of those wards!

You can also do some off-the-shelf k-means clustering of those vectors in R^4 and you get meaningful results there. Essentially arbitrarily I broke the wards into 5 clusters and got:

[28, 29, 30, 32, 39, 40, 41, 42, 44, 45, 51, 52, 53, 62, 63, 64, 65, 81, 82, 105]

(east side Isthmus and near West)
[3, 4, 5, 7, 9, 10, 11, 17, 18, 22, 23, 24, 26, 38, 75, 88, 89, 90, 94, 96, 106, 107, 110, 111]

(far east and far west)

[15, 43, 46, 47, 48, 49, 50, 55, 56, 57, 58, 59, 60, 61, 66, 68, 69]

(campus and south Park)

[2, 12, 13, 14, 16, 21, 31, 33, 34, 35, 36, 37, 67, 80, 83, 84, 85, 86, 87, 93, 108, 109]

(west side, Hill Farms, north side, east of Monona)

[1, 6, 8, 19, 20, 25, 70, 71, 72, 73, 74, 76, 77, 78, 79, 91, 92, 95, 97, 98, 99, 100, 101, 102, 103, 104]

(southwest Madison and south of Monona)

Now what would be interesting is to go back and compare this with the ward-by-ward results of the gubernatorial primary last August! But I have other stuff to do today. Here’s some code so I remember it; this stuff is all simple and I have made no attempt to make the analysis robust.

Update:  I did the comparison with the August primary; interestingly, I didn’t see very many strong relationships. Soglin-for-mayor wards were typically also Soglin-for-governor wards. Wards that were strong for Kelda Helen Roys were also strong for Raj Shukla and weak for Soglin, but there wasn’t a strong relationship between Roys vote and Rhodes-Conway vote. On the other hand, Rhodes-Conway’s good wards also tended to be good ones for… Mike McCabe??

import csv
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA

Wards = [s for s in S]

#votes for Rhodes-Conway, Soglin, Shukla,Cheeks in that order

Alabel = np.array([[int(s[1]),int(s[2]),int(s[4]),int(s[5]),i+1] for i,s in enumerate(Wards[11:122])])

# strip out small districts

indices = [i for (i,row) in enumerate(Alabel) if sum(row[0:3]) < 20]

Astripped = np.delete(Alabel,indices,0)

Anorm = np.array([row/(sum(row2)(0.5)) for row in A])

# here’s the PCA

wards2d = np.transpose(PCA(n_components=2).fit_transform(Anorm))

plt.scatter(wards2d[0],wards2d[1],s=0)
for i in range(len(Astripped)):
plt.annotate(Astripped[i][4],(wards2d[0][i],wards2d[1][i]))

plt.show()

# and here’s the kmeans stuff

from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=5).fit(Anorm)
kmeans.labels_

Tagged , ,