By the way, here’s another fun word2vec trick. Following Ben Schmidt, you can try to find “gender-neutralized synonyms” — words which are close to each other except for the fact that they have different gender connotations. A quick and dirty way to “mascify” a word is to find its nearest neighbor which is closer to “he” than “she”:
def mascify(y): return [x for x in model.most_similar(y,topn=200) if model.similarity(x,’she’) < model.similarity(x,’he’)]
“femify” is defined similarly. We could put a threshold away from 0 in there, if we wanted to restrict to more strongly gender-coded words.
Anyway, if you start with a word and run mascify and femify alternately, you can ask whether you eventually wind up in a 2-cycle: a pair of words which are each others gender counterparts in this loose sense.
gentle -> easygoing -> chatty -> talkative -> chatty -> …..
So “chatty” and “talkative” are a pair, with “chatty” female-coded and “talkative” male-coded.
beautiful -> magnificent -> wonderful -> marvelous -> wonderful -> …
So far, I keep hitting 2-cycles, and pretty quickly, though I don’t see why a longer cycle wouldn’t be possible or likely. Update: Kevin in comments explains very nicely why it has to terminate in a 2-cycle!
Some other pairs, female-coded word first:
overjoyed / elated
strident / vehement
fearful / worried
furious / livid
distraught / despondent
hilarious / funny
exquisite / sumptuous
thought_provoking / insightful
kick_ass / badass
Sometimes it’s basically giving the same word, in two different forms or with one word misspelled:
intuitive / intuitively
buoyant / bouyant
sad / Sad
You can do this for names, too, though you have to make the “topn” a little longer to find matches. I found:
Jamie / Chris
Deborah / Jeffrey
Fran / Pat
Mary / Joseph
Pretty good pairs! Note that you hit a lot of gender-mixed names (Jamie, Chris, Pat), just as you might expect: the male-biased name word2vec-closest to a female name is likely to be a name at least some women have! You can deal with this by putting in a threshold:
>> def mascify(y): return [x for x in model.most_similar(y,topn=1000) if model.similarity(x,’she’) < model.similarity(x,’he’) – 0.1]
This eliminates “Jamie” and “Pat” (though “Chris” still reads as male.)
Now we get some new pairs:
Jody / Steve (this one seems to have a big basis of attraction, it shows up from a lot of initial conditions)
Kasey / Zach
Peter / Catherine (is this a Russia thing?)
Nicola / Dominic
Alison / Ian