## Peter Norvig, the meaning of polynomials, debugging as psychotherapy

I saw Peter Norvig give a great general-audience talk on AI at Berkeley when I was there last month.  A few notes from his talk.

• “We have always prioritized fast and cheap over safety and privacy — maybe this time we can make better choices.”
• He briefly showed a demo where, given values of a polynomial, a machine can put together a few lines of code that successfully computes the polynomial.  But the code looks weird to a human eye.  To compute some quadratic, it nests for-loops and adds things up in a funny way that ends up giving the right output.  So has it really ”learned” the polynomial?  I think in computer science, you typically feel you’ve learned a function if you can accurately predict its value on a given input.  For an algebraist like me, a function determines but isn’t determined by the values it takes; to me, there’s something about that quadratic polynomial the machine has failed to grasp.  I don’t think there’s a right or wrong answer here, just a cultural difference to be aware of.  Relevant:  Norvig’s description of “the two cultures” at the end of this long post on natural language processing (which is interesting all the way through!)
• Norvig made the point that traditional computer programs are very modular, leading to a highly successful debugging tradition of zeroing in on the precise part of the program that is doing something wrong, then fixing that part.  An algorithm or process developed by a machine, by contrast, may not have legible “parts”!  If a neural net is screwing up when classifying something, there’s no meaningful way to say “this neuron is the problem, let’s fix it.”  We’re dealing with highly non-modular complex systems which have evolved into a suboptimally functioning state, and you have to find a way to improve function which doesn’t involve taking the thing apart and replacing the broken component.  Of course, we already have a large professional community that works on exactly this problem.  They’re called therapists.  And I wonder whether the future of debugging will look a lot more like clinical psychology than it does like contemporary software engineering.
Tagged , , ,

## Ellenbergs

I got a message last week from the husband of my first cousin once removed;  his father-in-law, Leonard Ellenberg, was my grandfather Julius Ellenberg’s brother.  I never knew my grandfather; he died before I was born, and I was named for him.

The message contained a huge amount of information about a side of my family I’ve never known well.  I’m still going through it all.  But I wanted to share some of it while it was on my mind.

Here’s the manifest for the voyage of the S.S. Polonia, which left Danzig on September 17, 1923 and arrived in New York on October 1.

Owadje Ellenberg (always known as Owadia in my family) was my great-grandfather.  He came to New York with his wife Sura-Fejga (known to us as Sara), Markus (Max), Etia-Race (Ethel), Leon (Leonard), Samuel and Bernard.  Sara was seven months pregnant with my uncle Morris Ellenberg, the youngest child.

Owadje gives his occupation as “mason”; his son Max, only 17, was listed as “tailor.”  They came from Stanislawow, Poland, which is now the city of Ivano-Frankivsk in Ukraine.  On the immigration form you had to list a relative in your country of origin; Owadje listed his brother, Zacharja, who lived on Zosina Wola 6 in Stanislawow.  None of the old street names have survived to the present, but looking at this old map of Stanislawow

it seems pretty clear Zosina Wola is the present day Yevhena Konoval’tsya Street.  I have no way of knowing whether the numbering changed, but #6 Yevhena Konoval’tsya St. seems to be the setback building here:

So this is the best guess I have as to where my ancestors lived in the old country.  The name Zosina Wola lives on only in the name of a bar a few blocks down Yevhena Konoval’tsya:

Owadje, now Owadia, files a declaration of intention to naturalize in 1934:

His signature is almost as bad as mine!  By 1934 he’s living in Borough Park, Brooklyn, a plasterer.  5 foot 7 and 160lb; I think every subsequent Ellenberg man has been that size by the age of 15.  Shtetl nutrition.  There are two separate questions on this form, “color” and “race”:  for color he puts white, for race he puts “Hebrew.”  What did other Europeans put for race?  He puts his hometown as Sopoff, which I think must be the modern Sopiv; my great-grandmother Sara was from Obertyn, quite close by.  I guess they moved to the big city, Stanislowow, about 40 miles away, when they were pretty young; they got married there in 1902, when they were 21.  The form says he previously filed a declaration of intention in 1926.  What happened?  Did he just not follow through, or was his naturalization rejected?  Did he ever become a citizen?  I don’t know.

Here’s what his house in Brooklyn looks like now:

Did you notice whose name was missing from the Polonia’s manifest?  Ovadje’s oldest son, my grandfather, Julius.  Except one thing I’ve learned from all this is that I don’t actually know what my grandfather’s name was.  Julius is what we called him.  But my dad says his passport says “Israel Ellenberg.”  And his naturalization papers

have him as “Juda Ellenberg”  (Juda being the Anglicization of Yehuda, his and my Hebrew name.)  So didn’t that have to be his legal name?  But how could that not be on his passport?

Update:  Cousin Phyllis came through for me!  My grandfather legally changed his name to Julius on June 13, 1927, four months after he filed for naturalization.

My grandfather was the first to come to America, in December 1920, and he came alone.  He was 16.  He managed to make enough money to bring the whole rest of the family in late 1923, which was a good thing because in May 1924 Calvin Coolidge signed the Johnson-Reed Act which clamped down on immigration by people thought to be debasing the American racial stock:  among these were Italians, Chinese, Czechs, Spaniards, and Jews, definitely Jews.

Another thing I didn’t know:  my grandfather lists his port of entry as Vanceboro, Maine.  That’s not a seaport; it’s a small town on the Canadian border.  So Julius/Juda/Israel must have sailed to Canada; this I never knew.  Where would he have landed? Sounds like most Canadian immigrants landed at Quebec or Halifax, and Halifax makes much more sense if he entered the US at Vanceboro.  But why did he sail to Canada instead of the US?  And why did he leave from France (the form says “Montrese, France,” a place I can’t find) instead of Poland?  (Update:  My cousin comes through again:  another record shows that Julius arrived on Dec 7, 1920 in St. John, New Brunswick, conveyed in 3rd class by the S.S. Corsican.  Looks like this ship would have been coming from England, not France;  I don’t know how to reconcile that.)

In 1927, when he naturalized, Julius lived at 83 2nd Avenue, a building built in 1900 at the boundary of the Bowery and the East Village.  Here’s what it looks like now:

Not a lot of new immigrants able to afford rent there these days, I’m betting.  Later he’d move to Long Beach, Long Island, where my father and his sisters grew up.

My first-cousin-once-removed-in-law went farther back, too, all the way back to Mojżesz Ellenberg, who was born sometime in the middle of the 18th century.  The Hapsburg Empire required Jews to adopt surnames only in 1787; so Mojżesz could very well have been the first Ellenberg.  You may be thinking he’s Owadia’s father’s father’s father, but no — Ellenberg was Owadia’s mother’s name.  I was puzzled by this but actually it was common.  What it meant is that Mordko Kasirer, Owadia’s father, didn’t want to pay the fee for a civil marriage — why should he, when he was already married to Rivka Ellenberg in the synagogue?  But if you weren’t legally married, your children weren’t allowed to take their father’s surname.  So be it.  Mordko wasn’t gonna get ripped off by the system.  Definitely my relative.

Update:  Cousin Phyllis Rosner sends me my grandfather’s birth record.  At birth in Poland he’s Izrael Juda Ellenberg.  This still doesn’t answer what his legal name in the US was, but it explains the passport!

## Tweet, repeat

Messing around a bit with lexical analysis of my tweets (of which there are about 10,000).  It’s interesting to see which tweets I’ve essentially duplicated (I mean, after @-mentions, etc. are removed.)  Some of the top duplicates:

• Thanks (8 times)
• thanks (6 times)
• Yep (6 times)
• Yes (5 times)
• yep (5 times)
• Thanks so much (5 times)
• RT (5 times)
• I know right (4 times)

More detailed tweet analysis later.

Tagged

## Mathematicians becoming data scientists: Should you? How to?

I was talking the other day with a former student at UW, Sarah Rich, who’s done degrees in both math and CS and then went off to Twitter.  I asked her:  so what would you say to a math Ph.D. student who was wondering whether they would like being a data scientist in the tech industry?  How would you know whether you might find that kind of work enjoyable?  And if you did decide to pursue it, what’s the strategy for making yourself a good job candidate?

Sarah exceeded my expectations by miles and wrote the following extremely informative and thorough tip sheet, which she’s given me permission to share.  Take it away, Sarah!

Tagged , , ,

## AB for President

AB was talking about being President this morning.

Me:  I think you could be a really good candidate; you’re funny, and you get along with almost everybody.

AB:  And I have great hair!

She gets it.

## Braid monodromy and the dual curve

Nick Salter gave a great seminar here about this paper; hmm, maybe I should blog about that paper, which is really interesting, but I wanted to make a smaller point here.  Let C be a smooth curve in P^2 of degree n. The lines in P^2 are parametrized by the dual P^2; let U be the open subscheme of the dual P^2 parametrizing those lines which are not tangent to C; in other words, U is the complement of the dual curve C*.  For each point u of U, write L_u for the corresponding line in P^2.

This gives you a fibration X -> U where the fiber over a point u in U is L_u – (L_u intersect C).  Since L_u isn’t tangent to C, this fiber is a line with n distinct points removed.  So the fibration gives you an (outer) action of pi_1(U) on the fundamental group of the fiber preserving the puncture classes; in other words, we have a homomorphism

$\pi_1(U) \rightarrow B_n$

where B_n is the n-strand braid group.

When you restrict to a line L* in U (i.e. a pencil of lines through a point in the original P^2) you get a map from a free group to B_n; this is the braid monodromy of the curve C, as defined by Moishezon.  But somehow it feels more canonical to consider the whole representation of pi_1(U).  Here’s one place I see it:  Proposition 2.4 of this survey by Libgober shows that if C is a rational nodal curve, then pi_1(U) maps isomorphically to B_n.  (OK, C isn’t smooth, so I’d have to be slightly more careful about what I mean by U.)

## Mark Metcalf

Have you ever heard of this guy?  I hadn’t.  Or thought I hadn’t.  But: he was Niedermeyer in Animal House

and the dad in Twisted Sister’s “We’re Not Gonna Take It” video

and the Master, the Big Bad of Buffy the Vampire Slayer season 1.

That’s a hell of a career! Plus: he lived in suburban Milwaukee until three years ago! And he used to go out with Glenn Close and Carrie Fisher! OK. Now I’ve heard of Mark Metcalf and so have you.

## Curry apple chicken

I didn’t have time to do a real shop and had nothing for Friday night dinner so I bought some chicken and a bag of apples and made something that came out surprisingly well; I hereby record it.

Ingredients:

2-3 lb boneless chicken breasts

5 apples, cubed

some scallions

some vegetable oil, whatever kind, doesn’t matter, I used olive

1 tbsp ground coriander

1 tbsp ground cumin

1/2 tsp turmeric

1 tsp salt

however much minced garlic you’re into

1/2-1 tsp garam masala

some crushed tomatoes but you could use actual tomatoes if it weren’t the middle of winter

Recipe:

Get oil hot.  Throw apples and scallions in.  Stir and cook 5 mins until apples soft.  Clear off some pan space and put coriander, cumin, turmeric, salt in the oil, let it cook 30 sec – 1 min, then throw in all the chicken, which by the way you cut into chunks, saute it all up until it’s cooked through.  Put the minced garlic in and let that cook for a minute.  Then put in however much tomato you need to combine with everything else in the pan and make a sauce.  (Probably less than you think, you don’t want soup.)  Turn heat down to warm and mix in garam masala.  You could just eat it like this or you could have been making some kind of starch in parallel.  I made quinoa.  CJ liked this, AB did not.

I took the spice proportions from a Madhur Jaffrey recipe but this is in no way meant as actual Indian food, obviously.  I guess I was just thinking about how when I was a kid you would totally get a “curry chicken salad” which was shredded chicken with curry powder, mayonnaise, and chunked up apple, and I sort of wanted a hot mayonnaiseless version of that.  Also, when I was in grad school learning to cook from Usenet with David Carlton, we used to make a salad with broiled chicken and curry mayonnaise and grapes.  I think it was this.  Does that sound right, David?   Yes, that recipe calls for 2 cups of mayonnaise.  It was a different time.  I feel like we would make this and then put it on top of like 2 pounds of rotini and have food for days.

Tagged , ,

## Difference sets missing a Hamming sphere

I tend to think of the Croot-Lev-Pach method (as used, for instance, in the cap set problem) as having to do with n-tensors, where n is bigger than 2.  But you can actually also use it in the case of 2-tensors, i.e. matrices, to say things that (as far as I can see) are not totally trivial.

Write m_d for the number of squarefree monomials in x_1, .. x_n of degree at most d; that is,

$m_d = 1 + {n \choose 1} + {n \choose 2} + \ldots + {n \choose d}.$

Claim:  Let P be a polynomial of degree d in F_2[x_1, .. x_n] such that P(0) = 1.  Write S for the set of nonzero vectors x such that P(x) = 1.  Let A be a subset of F_2^n such that no two elements of A have difference lying in S.  Then |A| < 2m_{d/2}.

Proof:  Write M for the A x A matrix whose (a,b) entry is P(a-b).  By the Croot-Lev-Pach lemma, this matrix has rank at most 2m_{d/2}.  By hypothesis on A, M is the identity matrix, so its rank is |A|.

Remark: I could have said “sum” instead of “difference” since we’re in F_2 but for larger finite fields you really want difference.

The most standard context in which you look for large subsets of F_2^n with restricted difference sets is that of error correcting codes, where you ask that no two distinct elements of A have difference with Hamming weight (that is, number of 1 entries) at most k.

It would be cool if the Croot-Lev-Pach lemma gave great new bounds on error-correcting codes, but I don’t think it’s to be.  You would need to find a polynomial P which vanishes on all nonzero vectors of weight larger than k, but which doesn’t vanish at 0. Moreover, you already know that the balls of size k/2 around the points of A are disjoint, which gives you the “volume bound”

|A| < 2^n / m_{k/2}.

I think that’ll be hard to beat.

If you just take a random polynomial P, the support of P will take up about half of F_2^n; so it’s not very surprising that a set whose difference misses that support has to be small!

Here’s something fun you can do, though.  Let s_i be the i-th symmetric function on x_1, … x_n.  Then

$s_i(x) = {wt(x) \choose i}$

where wt(x) denotes Hamming weight.  Recall also that the binomial coefficient

${k \choose 2^a}$

is odd precisely when the a’th binary digit of k is 1.

Thus,

$(1-s_1(x))(1-s_2(x))(1-s_4(x))\ldots(1-s_{2^{b-1}}(x))$

is a polynomial of degree 2^b-1 which vanishes on x unless the last b digits of wt(x) are 0; that is, it vanishes unless wt(x) is a multiple of 2^b.  Thus we get:

Fact:  Let A be a subset of F_2^n such that the difference of two nonzero elements in A never has weight a multiple of 2^b.  Then

$|A| \leq 2m_{2^{b-1} - 1}$.

Note that this is pretty close to sharp!  Because if we take A to be the set of vectors of  weight at most 2^{b-1} – 1, then A clearly has the desired property, and already that’s half as big as the upper bound above.  (What’s more, you can throw in all the vectors of weight 2^{b-1} whose first coordinate is 1; no two of these sum to something of weight 2^b.  The Erdös-Ko-Rado theorem says you can do no better with those weight 2^{b-1} vectors.)

Is there an easier way to prove this?

When b=1, this just says that a set with no differences of even Hamming weight has size at most 2; that’s clear, because two vectors whose Hamming weight has the same parity differ by a vector of even weight.  Even for b=2 this isn’t totally obvious to me.  The result says that a subset of F_2^n with no differences of weight divisible by 4 has size at most 2+2n.  On the other hand, you can get 1+2n by taking 0, all weight-1 vectors, and all weight-2 vectors with first coordinate 1.  So what’s the real answer, is it 1+2n or 2+2n?

Write H(n,k) for the size of the largest subset of F_2^n having no two vectors differing by a vector of Hamming weight exactly k.  Then if 2^b is the largest power of 2 less than n, we have shown above that

$m_{2^{b-1} - 1 } \leq H(n,2^b) \leq 2m_{2^{b-1} - 1}$.

On the other hand, if k is odd, then H(n,k) = 2^{n-1}; we can just take A to be the set of all even-weight vectors!  So perhaps H(n,k) actually depends on k in some modestly interesting 2-adic way.

The sharpness argument above can be used to show that H(4m,2m) is as least

$2(1 + 4m + {4m \choose 2} + \ldots + {4m \choose m-1} + {4m-1 \choose m-1}). (*)$

I was talking to Nigel Boston about this — he did some computations which make it looks like H(4m,2m) is exactly equal to (*) for m=1,2,3.  Could that be true for general m?

(You could also ask about sets with no difference of weight a multiple of k; not sure which is the more interesting question…)

Update:  Gil Kalai points out to me that much of this is very close to and indeed in some parts a special case of the Frankl-Wilson theorem…  I will investigate further and report back!

## Vacuum cleaner on sale

I was explaining the “regular price” scam to CJ the other day. A store sells a vacuum cleaner for $79.95. One day, they put up a sign saying “SALE! Regular price,$109.95; now MARKED DOWN to $79.95.” The point is to create an imaginary past that never was, a past where vacuum cleaners cost$109.95, a difficult past from which the store has generously granted you respite.
This is what Trump’s team is doing. They’re trying to create an imaginary past in which the last 5 years of life in America was characterized by ubiquitous street crime, unchecked terrorism, and mass unemployment. So that life in America in 2017 and 2018 will seem comparatively placid, safe, and prosperous. Look how much I saved you on this goddamn vacuum cleaner. You’re welcome.

Tagged