Gendercycle: a dynamical system on words

By the way, here’s another fun word2vec trick.  Following Ben Schmidt, you can try to find “gender-neutralized synonyms” — words which are close to each other except for the fact that they have different gender connotations.   A quick and dirty way to “mascify” a word is to find its nearest neighbor which is closer to “he” than “she”:

def mascify(y): return [x[0] for x in model.most_similar(y,topn=200) if model.similarity(x[0],’she’) < model.similarity(x[0],’he’)][0]

“femify” is defined similarly.  We could put a threshold away from 0 in there, if we wanted to restrict to more strongly gender-coded words.

Anyway, if you start with a word and run mascify and femify alternately, you can ask whether you eventually wind up in a 2-cycle:  a pair of words which are each others gender counterparts in this loose sense.

e.g.

gentle -> easygoing -> chatty -> talkative -> chatty -> …..

So “chatty” and “talkative” are a pair, with “chatty” female-coded and “talkative” male-coded.

beautiful -> magnificent -> wonderful -> marvelous -> wonderful -> …

So far, I keep hitting 2-cycles, and pretty quickly, though I don’t see why a longer cycle wouldn’t be possible or likely.  Update:  Kevin in comments explains very nicely why it has to terminate in a 2-cycle!

Some other pairs, female-coded word first:

overjoyed / elated

strident / vehement

fearful / worried

furious / livid

distraught / despondent

hilarious / funny

exquisite / sumptuous

thought_provoking / insightful

kick_ass / badass

Sometimes it’s basically giving the same word, in two different forms or with one word misspelled:

intuitive / intuitively

buoyant / bouyant

sad / Sad

You can do this for names, too, though you have to make the “topn” a little longer to find matches.  I found:

Jamie / Chris

Deborah / Jeffrey

Fran / Pat

Mary / Joseph

Pretty good pairs!  Note that you hit a lot of gender-mixed names (Jamie, Chris, Pat), just as you might expect:  the male-biased name word2vec-closest to a female name is likely to be a name at least some women have!  You can deal with this by putting in a threshold:

>> def mascify(y): return [x[0] for x in model.most_similar(y,topn=1000) if model.similarity(x[0],’she’) < model.similarity(x[0],’he’) – 0.1][0]

This eliminates “Jamie” and “Pat” (though “Chris” still reads as male.)

Now we get some new pairs:

Jody / Steve (this one seems to have a big basis of attraction, it shows up from a lot of initial conditions)

Kasey / Zach

Peter / Catherine (is this a Russia thing?)

Nicola / Dominic

Alison / Ian

 

 

 

 

 

Tagged , , , ,

2 thoughts on “Gendercycle: a dynamical system on words

  1. […] Update:  Even a little more messing around with “changing the gender of words” in a followup post. […]

  2. Kevin says:

    “…though I don’t see why a longer cycle wouldn’t be possible or likely.”

    It seems to me if I start with a word X which is closer to “she” than “he”, and I pick its nearest neighbor that is closer to “he” than “she”, call it Y, then the nearest neighbor to Y which is closer to “she” than “he” can be no farther away than X. Therefore, the distance between consecutive elements of your sequence is decreasing, so it converges, so a 2-cycle is inevitable I think.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: