By the way, here’s another fun word2vec trick. Following Ben Schmidt, you can try to find “gender-neutralized synonyms” — words which are close to each other except for the fact that they have different gender connotations. A quick and dirty way to “mascify” a word is to find its nearest neighbor which is closer to “he” than “she”:
def mascify(y): return [x[0] for x in model.most_similar(y,topn=200) if model.similarity(x[0],’she’) < model.similarity(x[0],’he’)][0]
“femify” is defined similarly. We could put a threshold away from 0 in there, if we wanted to restrict to more strongly gender-coded words.
Anyway, if you start with a word and run mascify and femify alternately, you can ask whether you eventually wind up in a 2-cycle: a pair of words which are each others gender counterparts in this loose sense.
e.g.
gentle -> easygoing -> chatty -> talkative -> chatty -> …..
So “chatty” and “talkative” are a pair, with “chatty” female-coded and “talkative” male-coded.
beautiful -> magnificent -> wonderful -> marvelous -> wonderful -> …
So far, I keep hitting 2-cycles, and pretty quickly, though I don’t see why a longer cycle wouldn’t be possible or likely. Update: Kevin in comments explains very nicely why it has to terminate in a 2-cycle!
Some other pairs, female-coded word first:
overjoyed / elated
strident / vehement
fearful / worried
furious / livid
distraught / despondent
hilarious / funny
exquisite / sumptuous
thought_provoking / insightful
kick_ass / badass
Sometimes it’s basically giving the same word, in two different forms or with one word misspelled:
intuitive / intuitively
buoyant / bouyant
sad / Sad
You can do this for names, too, though you have to make the “topn” a little longer to find matches. I found:
Jamie / Chris
Deborah / Jeffrey
Fran / Pat
Mary / Joseph
Pretty good pairs! Note that you hit a lot of gender-mixed names (Jamie, Chris, Pat), just as you might expect: the male-biased name word2vec-closest to a female name is likely to be a name at least some women have! You can deal with this by putting in a threshold:
>> def mascify(y): return [x[0] for x in model.most_similar(y,topn=1000) if model.similarity(x[0],’she’) < model.similarity(x[0],’he’) – 0.1][0]
This eliminates “Jamie” and “Pat” (though “Chris” still reads as male.)
Now we get some new pairs:
Jody / Steve (this one seems to have a big basis of attraction, it shows up from a lot of initial conditions)
Kasey / Zach
Peter / Catherine (is this a Russia thing?)
Nicola / Dominic
Alison / Ian
[…] Update: Even a little more messing around with “changing the gender of words” in a followup post. […]
“…though I don’t see why a longer cycle wouldn’t be possible or likely.”
It seems to me if I start with a word X which is closer to “she” than “he”, and I pick its nearest neighbor that is closer to “he” than “she”, call it Y, then the nearest neighbor to Y which is closer to “she” than “he” can be no farther away than X. Therefore, the distance between consecutive elements of your sequence is decreasing, so it converges, so a 2-cycle is inevitable I think.