People love to make fun of George Foreman because he named all his sons George Foreman. But former Secretary of State Lawrence Eagleburger named all his sons Lawrence Eagleburger and nobody raises a peep! There’s no justice.
By the way, here’s another fun word2vec trick. Following Ben Schmidt, you can try to find “gender-neutralized synonyms” — words which are close to each other except for the fact that they have different gender connotations. A quick and dirty way to “mascify” a word is to find its nearest neighbor which is closer to “he” than “she”:
def mascify(y): return [x for x in model.most_similar(y,topn=200) if model.similarity(x,’she’) < model.similarity(x,’he’)]
“femify” is defined similarly. We could put a threshold away from 0 in there, if we wanted to restrict to more strongly gender-coded words.
Anyway, if you start with a word and run mascify and femify alternately, you can ask whether you eventually wind up in a 2-cycle: a pair of words which are each others gender counterparts in this loose sense.
gentle -> easygoing -> chatty -> talkative -> chatty -> …..
So “chatty” and “talkative” are a pair, with “chatty” female-coded and “talkative” male-coded.
beautiful -> magnificent -> wonderful -> marvelous -> wonderful -> …
So far, I keep hitting 2-cycles, and pretty quickly, though I don’t see why a longer cycle wouldn’t be possible or likely. Update: Kevin in comments explains very nicely why it has to terminate in a 2-cycle!
Some other pairs, female-coded word first:
overjoyed / elated
strident / vehement
fearful / worried
furious / livid
distraught / despondent
hilarious / funny
exquisite / sumptuous
thought_provoking / insightful
kick_ass / badass
Sometimes it’s basically giving the same word, in two different forms or with one word misspelled:
intuitive / intuitively
buoyant / bouyant
sad / Sad
You can do this for names, too, though you have to make the “topn” a little longer to find matches. I found:
Jamie / Chris
Deborah / Jeffrey
Fran / Pat
Mary / Joseph
Pretty good pairs! Note that you hit a lot of gender-mixed names (Jamie, Chris, Pat), just as you might expect: the male-biased name word2vec-closest to a female name is likely to be a name at least some women have! You can deal with this by putting in a threshold:
>> def mascify(y): return [x for x in model.most_similar(y,topn=1000) if model.similarity(x,’she’) < model.similarity(x,’he’) – 0.1]
This eliminates “Jamie” and “Pat” (though “Chris” still reads as male.)
Now we get some new pairs:
Jody / Steve (this one seems to have a big basis of attraction, it shows up from a lot of initial conditions)
Kasey / Zach
Peter / Catherine (is this a Russia thing?)
Nicola / Dominic
Alison / Ian
When you get the copy-edited manuscript of a book back, it comes with a document called “Names and Words,” this is a list of proper names or unusual words in the book which might admit variant spelling or typography, and the list is there to keep everybody on the production team uniform.
Here’s the A-B section of my list. I think it gives a pretty good sense of what the book is about.
Niels Henrik Abel
Alcmaeon of Croton
Alhazen (Abu ‘Ali al-Hasan ibn al-Haytham)
Akhil and Vikram Amar
Apollonius of Perga
Daryl Renard Atkins
Leroy E. Burney
R. H. Bing
Jean-Charles de Borda
Georges-Louis LeClerc, Comte de Buffon
My 20th Harvard reunion book is in hand, offering a social snapshot of a certain educationally (and mostly financially) elite slice of the US population.
Here is what Harvard alums name their kids. These are chosen by alphabetical order of surname from one segment of the book. Most of these children are born between 2003 and the present. They are grouped by family.
Zachary, Zoe, Alex
Elias, Ella, Irena
Grayson, Parker, Saya
Io, Pico, Daphne
Richard, Annalise, Ryan
Christopher, Sarah, Zachary, Claire
Alexandra, Victoria, Arianna, Madeline
Grace, Luke, Anna
William, Cecilia, Maya
Nathan, Spencer, Benjamin
Matthew, Lauren, Ava, Nathan
Samuel, Katherine, Peter, Sophia
Andrew, Zachary, Nathan
Shania, Tell, Delia
Nathan, Lukas, Jacob
Noah, Haydn, Ellyson
Asha, Sophie, Tejas
Gabriela, Carlos, Sebastian
James, Seeger, Arden
If you saw these names, would you be able to guess roughly what part of the culture they were drawn from? Are there ways in which the distribution is plainly different from “standard” US naming practice?
Had a great time today talking graph theory with a roomful of students and faculty in the humanities at the Humanities Hackathon. Here’s a (big .ppt file) link to my slides. One popular visualization was this graph of baby boys’ names from 2011, where two names are adjacent if their popularity profile across 12 representative states is very similar. (For example, names similar to “Malachi” on this measure include “Ashton” and “Kaden,” while names similar to “Patrick” include “Thomas,” “John,” “Sean,” and “Ryan.”)
The visualization is by the open-source graph-viz tool gephi.
I came home only to encounter this breathless post from the Science blog about a claim that you can use network invariants (e.g. clustering coefficient, degree distribution, correlation of degree between adjacent nodes) to distinguish factually grounded narratives like the Iliad from entirely fictional ones like Harry Potter. The paper itself is not so convincing. For instance, its argument on “assortativity,” the property that high-degree nodes tend to be adjacent to one another, goes something like this:
Real-life social networks tend to be assortative, in the sense that the number of friends I have is positively correlated with the number of friends my friends have.
The social network they write down for the Iliad isn’t assortative, so they remove all the interactions classified as “hostile,” and then it is.
The social network for Beowulf isn’t assortative, so they remove all the interactions classified as “hostile,” and then it still isn’t, so they take out Beowulf himself, and then it is, but just barely.
Conclusion: The social networks of Beowulf and the Iliad are assortative, just like real social networks.
Digital humanities can be better than this!
I’m messing around with some Social Security baby name data, preparing examples for the upcoming Humanities Hackathon at the Wisconsin Institute for Discovery, where I’ll be teaching a one-day course on networks and graphs.
So here’s a quiz. I give a list of baby names which were strongly overrepresented among babies born in state X in 2011. You name the state.
Hint: most of these are pretty big states, and one of them is Wisconsin.
Hint 2: Knowing about football will help you with one of these and knowing about baseball with another.
I think these are pretty hard.
Answers below the fold: