]]>

- “We have always prioritized fast and cheap over safety and privacy — maybe this time we can make better choices.”
- He briefly showed a demo where, given values of a polynomial, a machine can put together a few lines of code that successfully computes the polynomial. But the code looks
*weird*to a human eye. To compute some quadratic, it nests for-loops and adds things up in a funny way that ends up giving the right output. So has it really ”learned” the polynomial? I think in computer science, you typically feel you’ve learned a function if you can accurately predict its value on a given input. For an algebraist like me, a function determines but isn’t determined by the values it takes; to me, there’s something about that quadratic polynomial the machine has failed to grasp. I don’t think there’s a right or wrong answer here, just a cultural difference to be aware of. Relevant: Norvig’s description of “the two cultures” at the end of this long post on natural language processing (which is interesting all the way through!) - Norvig made the point that traditional computer programs are very modular, leading to a highly successful debugging tradition of zeroing in on the precise part of the program that is doing something wrong, then fixing that part. An algorithm or process developed by a machine, by contrast, may not have legible “parts”! If a neural net is screwing up when classifying something, there’s no meaningful way to say “this neuron is the problem, let’s fix it.” We’re dealing with highly non-modular complex systems which have evolved into a suboptimally functioning state, and you have to find a way to improve function which doesn’t involve taking the thing apart and replacing the broken component. Of course, we already have a large professional community that works on exactly this problem. They’re called therapists. And I wonder whether the future of debugging will look a lot more like clinical psychology than it does like contemporary software engineering.

]]>

The message contained a huge amount of information about a side of my family I’ve never known well. I’m still going through it all. But I wanted to share some of it while it was on my mind.

Here’s the manifest for the voyage of the S.S. Polonia, which left Danzig on September 17, 1923 and arrived in New York on October 1.

Owadje Ellenberg (always known as Owadia in my family) was my great-grandfather. He came to New York with his wife Sura-Fejga (known to us as Sara), Markus (Max), Etia-Race (Ethel), Leon (Leonard), Samuel and Bernard. Sara was seven months pregnant with my uncle Morris Ellenberg, the youngest child.

Owadje gives his occupation as “mason”; his son Max, only 17, was listed as “tailor.” They came from Stanislawow, Poland, which is now the city of Ivano-Frankivsk in Ukraine. On the immigration form you had to list a relative in your country of origin; Owadje listed his brother, Zacharja, who lived on Zosina Wola 6 in Stanislawow. None of the old street names have survived to the present, but looking at this old map of Stanislawow

it seems pretty clear Zosina Wola is the present day Yevhena Konoval’tsya Street. I have no way of knowing whether the numbering changed, but #6 Yevhena Konoval’tsya St. seems to be the setback building here:

So this is the best guess I have as to where my ancestors lived in the old country. The name Zosina Wola lives on only in the name of a bar a few blocks down Yevhena Konoval’tsya:

Owadje, now Owadia, files a declaration of intention to naturalize in 1934:

His signature is almost as bad as mine! By 1934 he’s living in Borough Park, Brooklyn, a plasterer. 5 foot 7 and 160lb; I think every subsequent Ellenberg man has been that size by the age of 15. Shtetl nutrition. There are two separate questions on this form, “color” and “race”: for color he puts white, for race he puts “Hebrew.” What did other Europeans put for race? He puts his hometown as Sopoff, which I think must be the modern Sopiv; my grandmother Sara was from Obertyn, quite close by. I guess they moved to the big city, Stanislowow, about 40 miles away, when they were pretty young; they got married there in 1902, when they were 21. The form says he previously filed a declaration of intention in 1926. What happened? Did he just not follow through, or was his naturalization rejected? Did he ever become a citizen? I don’t know.

Here’s what his house in Brooklyn looks like now:

Did you notice whose name was missing from the Polonia’s manifest? Ovadje’s oldest son, my grandfather, Julius. Except one thing I’ve learned from all this is that *I don’t actually know what my grandfather’s name was.* Julius is what we called him. But my dad says his passport says “Israel Ellenberg.” And his naturalization papers

have him as “Juda Ellenberg” (Juda being the Anglicization of Yehuda, his and my Hebrew name.) So didn’t that have to be his legal name? But how could that not be on his passport?

**Update: ** Cousin Phyllis came through for me! My grandfather legally changed his name to Julius on June 13, 1927, four months after he filed for naturalization. ** **

My grandfather was the first to come to America, in December 1920, and he came alone. He was 16. He managed to make enough money to bring the whole rest of the family in late 1923, which was a good thing because in May 1924 Calvin Coolidge signed the Johnson-Reed Act which clamped down on immigration by people thought to be debasing the American racial stock: among these were Italians, Chinese, Czechs, Spaniards, and Jews, definitely Jews.

Another thing I didn’t know: my grandfather lists his port of entry as Vanceboro, Maine. That’s not a seaport; it’s a small town on the Canadian border. So Julius/Juda/Israel must have sailed to Canada; this I never knew. Where would he have landed? Sounds like most Canadian immigrants landed at Quebec or Halifax, and Halifax makes much more sense if he entered the US at Vanceboro. But why did he sail to Canada instead of the US? And why did he leave from France (the form says “Montrese, France,” a place I can’t find) instead of Poland?

In 1927, when he naturalized, Julius lived at 83 2nd Avenue, a building built in 1900 at the boundary of the Bowery and the East Village. Here’s what it looks like now:

Not a lot of new immigrants able to afford rent there these days, I’m betting. Later he’d move to Long Beach, Long Island, where my father and his sisters grew up.

My first-cousin-once-removed-in-law went farther back, too, all the way back to Mojżesz Ellenberg, who was born sometime in the middle of the 18th century. The Hapsburg Empire required Jews to adopt surnames only in 1787; so Mojżesz could very well have been the first Ellenberg. You may be thinking he’s Owadia’s father’s father’s father, but no — Ellenberg was Owadia’s *mother’s* name. I was puzzled by this but actually it was common. What it meant is that Mordko Kasirer, Owadia’s father, didn’t want to pay the fee for a civil marriage — why should he, when he was already married to Rivka Ellenberg in the synagogue? But if you weren’t legally married, your children weren’t allowed to take their father’s surname. So be it. Mordko wasn’t gonna get ripped off by the system. Definitely my relative.

**Update:** Cousin Phyllis Rosner sends me my grandfather’s birth record. At birth in Poland he’s Izrael Juda Ellenberg. This still doesn’t answer what his legal name in the US was, but it explains the passport!

]]>

- Thanks (8 times)
- thanks (6 times)
- Yep (6 times)
- Yes (5 times)
- yep (5 times)
- Thanks so much (5 times)
- RT (5 times)
- I know right (4 times)

More detailed tweet analysis later.

]]>

Sarah exceeded my expectations by miles and wrote the following extremely informative and thorough tip sheet, which she’s given me permission to share. Take it away, Sarah!

As I went back and edited my initial bullet points to Jordan, I realized I had a good deal to say on this topic. My bullet points started overflowing. So, the TL;DR (which I acknowledge is still pretty long) is in the bullet points. However, if you’re looking for a more Foster-Wallace-esque adventure, and also perhaps the most unique insights I have to offer on these questions, the footnotes are where it’s at.

** **

**Some bullet points to determine if you (a Math PhD student) would like this kind of work**:

- Do you like modeling complex problems mathematically? Do you enjoy taking a real-world problem and determining its salient features in the context of real-world constraints?
- Do you like communicating with diverse audiences, many of whom are not mathematical experts? Are you good at translating mathematical insights into non- or less mathematical language?
- Can you walk away from a problem when the solution is “good enough,” are you able to switch between tasks or problems with relative ease? Are you OK with simple solutions to problems that could have more complicated solutions, but only with rapidly diminishing returns?
- Are you OK with abandoning the work of your dissertation and potentially not publishing in academic journals anymore? (Unless you are at Google or Microsoft Research the likelihood is that you will not have this opportunity. You may, however, be able to publish in Industry journals or the Industry tracks of interdisciplinary academic conferences). Are you OK with becoming more of a practitioner than a theorist?
- Do you like programming in an object oriented, statically typed language and are you at least decent at it? This one really depends on the company and the role; at Twitter to have any impact at all you had to build it yourself, and that meant writing code in Scala; if you wanted to do anything REALLY interesting you had to be REALLY good at Scala, which is a serious investment. At other companies things can be different, but IMO the really interesting work requires knowing how to program at a level where you could pass as a Software Engineer. However there is this whole category of Data Scientist for which you only need to know R or Pandas (the R-like Python package) and be well-versed in Statistics. For that job you do mostly offline analysis of various business metrics; if you start building models of users in R you run into size constraints pretty quickly and then also “productionizing” (i.e., building a working system to update and implement your model) your approach needs to be taken up by someone else, which then takes you out of the driver’s seat in the chaotic environment of most tech companies. In conclusion, Math PhDs can run the risk of becoming irrelevant in tech if they cannot build things[1].

**How to position oneself for these types of jobs (and by “these types of jobs” I mean tech jobs as a Data Scientist or ML engineer NOT at Google Research or Microsoft Research)**:

- Have a coding project or two in java or C++ (or Scala, or Go) that has some real-world machine learning application. Pick a dataset, solve a problem, put it on your blog.
- Have an R or Pandas data analysis project where you take some publicly available data and glean some insights. Clarity of communication and a pretty presentation are what matter here, more than being super impressive technically, although that’s also good. If you make your project in an iPython notebook and then put it on github, all the better.
- Be ready to have a coding interview. Google-able resources to support you in your preparations abound. You can come up to speed 5-10 times faster than a typical CS undergrad (e.g. you won’t be struggling for weeks to grok summation notation) but this still does require
*some*Ideally you’d start practicing regularly (like, 1-2 hours per day if you have minimal CS or algorithms background) 2-6 months before you’re on the job market. Look at it like a qualifying exam. Most places will let you interview in Python but it might also be a good idea to know how you’d answer questions in java or C++. Resenting or dreading this part of the process will not help you be successful at it, and it is arguably worth investing the time to be successful. It will at the very least give you many more employment options to choose from. Also this time investment can somewhat overlap with the time you put in to the bullet points above. A final note: if you are not already comfortable with java, I do*not*recommend the book “Cracking the Coding Interview.” Many people swear by it but I found it to be way too overwhelming as a first resource. What I did to get started was I coded the algorithms that are introduced in Kleinberg and Tardos’*Algorithm Design*book (or at least the first 2/3 to ¾ of it); the book is small and well-written enough to not be completely impossible to get through in the time allotted, covers most of the data structures you might be asked to implement in an interview, and actually coding the algorithms presented was very good practice for coding interviews. Also, it will help you get good at analyzing run times of algorithms, which you’ll typically be asked to do as well. Combining this (or working through a similar book) with a good website which provides practice interview questions should be adequate preparation. The only hole here is that this doesn’t really help you understand object-oriented programming at all; finding a good internet resource that helps you understand the implementation of data structures as classes is probably advisable. It’s worth investing the time in a practical introduction to object oriented programming, but this is probably the least important of the things already mentioned (but may be very important when you actually start working, depending on how much production code you are actually expected to write). - Employers may opt, instead of the coding interview, to give you a take home coding project. You similarly need to prepare for these in much the same way you need to prepare for a coding interview; since these projects will typically have an expected turn around time, you should not wait until you have received one to prepare. The first two bullet points are also very good preparation for this type of task.
- Be able to talk about what you do and what you’re interested in at various levels, to various audiences. A big thing about the transition to tech is that you possibly start communicating with people who don’t really know what a vector space is[2]. Be ready to have those conversations. Honestly ask yourself if you’re OK with having those conversations.
- Research the industry and figure out what real-world problems you’d be interested in solving. Have a targeted job search that reflects that you already have an understanding of what’s out there and a path for your own career advancement in mind. Don’t see going to Industry as failing out of the academic career path; sometimes math grad students are viewed as seeing tech jobs as a sort of fallback that they’re entitled to if the academic path doesn’t work out. Try to pursue this chapter with the enthusiasm that you brought to your academic endeavors, and envision a path for yourself in this new context.

Obviously it is important to note that all opinions and advice expressed herein are just my opinion based on a rather limited array of experiences. I think the value added here comes from positioning (that is, I came from a pure math background originally), and my tendency to exhaustively and compulsively analyze and dissect any organization or situation I’m a part of.

[1] However, with regards to this bullet point, integrating the output of math PhDs into the workflow of a tech company is a known stumbling block and there’s a general consensus that excellent programming skills in addition to a rigorous mathematical background may be too much to ask of employees; there just simply aren’t enough people who have these qualifications. There’s a whole cottage industry that’s sprung up around translating the insight and modeling of data scientists into deployable code which is equal parts fascinating and wearying. These black-box analytics packages are part of a broader space of implemented solutions to abstractly defined problems called Enterprise Software where consumer companies begin to translate their problems into a form representable to a software solution and then receive the corresponding answers that are programmed in to that solution. As a mathematician, you may find you prefer the idea of working for an Enterprise or “B2B” company; you deal with abstracted problems and implement solutions perhaps according to some mathematically advanced modeling. You may have the opportunity to be rigorous and exacting in the development of this software and to make critical design choices in how information is aggregated, processed, and disseminated in its lifecycle through your system. You may be more directly on the critical path of the creation of the company’s final product than you would be at a company whose primary goal is *not* to manufacture analytics solutions. On the other hand, as a Data Scientist working for a company which is a consumer of an enterprise solution, you may bear witness to a phenomenon whereby employees translate their output into a format consumable by an enterprise solution and then are correspondingly shaped by that translation and its ripple effects on the implicit characterization of their work’s meaning and value*. This is happening everywhere all the time and can lead to the emergent phenomenon of a company which is really now just nothing more than the interaction of many dogs each being wagged by many tiny tails. Even more meta: as a certain brand of Data Scientist you may be tasked with the role of discerning a meaningful signal from the many outputs generated by various enterprise systems for tracking and aggregating data. It is actually this type of murky business problem that you are best suited to address and a good employer will seek your help in selecting metrics* which will correctly inform leadership about company progress. However, the opportunity of creating a role for a Data Scientist at this level is seldom recognized or appreciated. It is more likely your skills will be restricted to a more technical pursuit which may be ill-informed from a business standpoint in the first place; you will invest your time in generating a solution to a slightly or greatly misinterpreted and so misstated problem. But as long as you get to build ML models or deep learning systems or whatever thing it is that interests you, and get paid a lot to do it, that might not matter to you.

*a “metric” in business land is just a measurement, typically a thing that can be counted, or a percentage, often viewed over time. e.g. an IT department might have enterprise software for managing their tasks that reveals the number of open requests it has at any given time, the current number of resolved requests, or the resolved requests each day or week. It’s not entirely trivial to translate these metrics into a standard by which to evaluate the IT department’s performance; the company could be growing; an acquisition of another company may have led to a flurry of IT requests to get new employees on-boarded quickly while maintaining a desktop work environment similar to the one they had before, or something like that. What metric is flexible enough to reflect employee success in the diverse situations that arise in real life? This is an example of the not-entirely-trivial question around metrics to measure success that arises at all levels of a company all the time.

[2] Up until now, you have had the privilege of working for mathematicians all the way down (mostly). At the very least, your direct supervisor has been a mathematician whose mathematical insight and intuition you (hopefully) greatly respect; you have probably grown accustomed to working for someone who can not only understand but also analyze and evaluate your solution to a problem (indeed, in some cases they may have had to actively avoid thinking about your problem to avoid solving it before you). One of the biggest transitions when you move into Industry is that you will possibly move into a position where you are the local expert. This doesn’t just mean that you’re the expert on C*-algebras but someone else is the expert on etale cohomology (btw I have more or less forgotten what either of those things are; does this fact make you feel sad or empty? Your reaction is a good thing to note.); you will possibly be working for someone and with other people who never developed a real understanding of mathematical logic, who basically see you as some kind of computer-math sorcerer whose work they can’t begin to understand (and I’m talking here not about C*-algebras but about stochastic gradient descent or matrix multiplication or just matrices). The extent to which this is true obviously varies, but it’s a good idea to know ahead of time how much having mathematical colleagues is important to you, and also how much having a mathematically literate boss is important to you. Both are totally possible, but are by no means guaranteed. You might find that this is critical to your happiness on the job, so when comparing offers, don’t overlook it. The less your immediate supervisor can directly see the merits of your work, the more time you’ll have to spend explaining it and, more importantly, *selling* it. However, less ability on your supervisor’s part to oversee your work can also mean more creative freedom; when presented with a problem you may have total flexibility in determining the correct method of solution; if no one else in the room can understand what you’re doing you have a lot of control. But you also have a great deal more responsibility to consistently demonstrate your work’s value. There are tradeoffs, and different personalities thrive in each situation.

]]>

Me: I think you could be a really good candidate; you’re funny, and you get along with almost everybody.

AB: And I have great hair!

She gets it.

]]>

This gives you a fibration X -> U where the fiber over a point u in U is L_u – (L_u intersect C). Since L_u isn’t tangent to C, this fiber is a line with n distinct points removed. So the fibration gives you an (outer) action of pi_1(U) on the fundamental group of the fiber preserving the puncture classes; in other words, we have a homomorphism

where B_n is the n-strand braid group.

When you restrict to a line L* in U (i.e. a pencil of lines through a point in the original P^2) you get a map from a free group to B_n; this is the *braid monodromy* of the curve C, as defined by Moishezon. But somehow it feels more canonical to consider the whole representation of pi_1(U). Here’s one place I see it: Proposition 2.4 of this survey by Libgober shows that if C is a rational nodal curve, then pi_1(U) maps *isomorphically* to B_n. (OK, C isn’t smooth, so I’d have to be slightly more careful about what I mean by U.)

]]>

and the dad in Twisted Sister’s “We’re Not Gonna Take It” video

and the Master, the Big Bad of Buffy the Vampire Slayer season 1.

That’s a hell of a career! Plus: he lived in suburban Milwaukee until three years ago! And he used to go out with Glenn Close and Carrie Fisher! OK. Now I’ve heard of Mark Metcalf and so have you.

]]>

**Ingredients:**

2-3 lb boneless chicken breasts

5 apples, cubed

some scallions

some vegetable oil, whatever kind, doesn’t matter, I used olive

1 tbsp ground coriander

1 tbsp ground cumin

1/2 tsp turmeric

1 tsp salt

however much minced garlic you’re into

1/2-1 tsp garam masala

some crushed tomatoes but you could use actual tomatoes if it weren’t the middle of winter

**Recipe:**

Get oil hot. Throw apples and scallions in. Stir and cook 5 mins until apples soft. Clear off some pan space and put coriander, cumin, turmeric, salt in the oil, let it cook 30 sec – 1 min, then throw in all the chicken, which by the way you cut into chunks, saute it all up until it’s cooked through. Put the minced garlic in and let that cook for a minute. Then put in however much tomato you need to combine with everything else in the pan and make a sauce. (Probably less than you think, you don’t want soup.) Turn heat down to warm and mix in garam masala. You could just eat it like this or you could have been making some kind of starch in parallel. I made quinoa. CJ liked this, AB did not.

I took the spice proportions from a Madhur Jaffrey recipe but this is in no way meant as actual Indian food, obviously. I guess I was just thinking about how when I was a kid you would totally get a “curry chicken salad” which was shredded chicken with curry powder, mayonnaise, and chunked up apple, and I sort of wanted a hot mayonnaiseless version of that. Also, when I was in grad school learning to cook from Usenet with David Carlton, we used to make a salad with broiled chicken and curry mayonnaise and grapes. I think it was this. Does that sound right, David? Yes, that recipe calls for 2 cups of mayonnaise. It was a different time. I feel like we would make this and then put it on top of like 2 pounds of rotini and have food for days.

]]>

Write m_d for the number of squarefree monomials in x_1, .. x_n of degree at most d; that is,

**Claim: **Let P be a polynomial of degree d in F_2[x_1, .. x_n] such that P(0) = 1. Write S for the set of nonzero vectors x such that P(x) = 1. Let A be a subset of F_2^n such that no two elements of A have difference lying in S. Then |A| < 2m_{d/2}.

**Proof:** Write M for the A x A matrix whose (a,b) entry is P(a-b). By the Croot-Lev-Pach lemma, this matrix has rank at most 2m_{d/2}. By hypothesis on A, M is the identity matrix, so its rank is |A|.

*Remark:* I could have said “sum” instead of “difference” since we’re in F_2 but for larger finite fields you really want difference.

The most standard context in which you look for large subsets of F_2^n with restricted difference sets is that of *error correcting codes*, where you ask that no two distinct elements of A have difference with Hamming weight (that is, number of 1 entries) at most k.

It would be cool if the Croot-Lev-Pach lemma gave great new bounds on error-correcting codes, but I don’t think it’s to be. You would need to find a polynomial P which vanishes on all nonzero vectors of weight larger than k, but which doesn’t vanish at 0. Moreover, you already know that the balls of size k/2 around the points of A are disjoint, which gives you the “volume bound”

|A| < 2^n / m_{k/2}.

I think that’ll be hard to beat.

If you just take a random polynomial P, the support of P will take up about half of F_2^n; so it’s not very surprising that a set whose difference misses that support has to be small!

Here’s something fun you can do, though. Let s_i be the i-th symmetric function on x_1, … x_n. Then

where wt(x) denotes Hamming weight. Recall also that the binomial coefficient

is odd precisely when the a’th binary digit of k is 1.

Thus,

is a polynomial of degree 2^b-1 which vanishes on x unless the last b digits of wt(x) are 0; that is, it vanishes unless wt(x) is a multiple of 2^b. Thus we get:

**Fact: **Let A be a subset of F_2^n such that the difference of two nonzero elements in A never has weight a multiple of 2^b. Then

.

Note that this is pretty close to sharp! Because if we take A to be the set of vectors of weight at most 2^{b-1} – 1, then A clearly has the desired property, and already that’s half as big as the upper bound above. (What’s more, you can throw in all the vectors of weight 2^{b-1} whose first coordinate is 1; no two of these sum to something of weight 2^b. The Erdös-Ko-Rado theorem says you can do no better with those weight 2^{b-1} vectors.)

Is there an easier way to prove this?

When b=1, this just says that a set with no differences of even Hamming weight has size at most 2; that’s clear, because two vectors whose Hamming weight has the same parity differ by a vector of even weight. Even for b=2 this isn’t totally obvious to me. The result says that a subset of F_2^n with no differences of weight divisible by 4 has size at most 2+2n. On the other hand, you can get 1+2n by taking 0, all weight-1 vectors, and all weight-2 vectors with first coordinate 1. So what’s the real answer, is it 1+2n or 2+2n?

Write H(n,k) for the size of the largest subset of F_2^n having no two vectors differing by a vector of Hamming weight exactly k. Then if 2^b is the largest power of 2 less than n, we have shown above that

.

On the other hand, if k is odd, then H(n,k) = 2^{n-1}; we can just take A to be the set of all even-weight vectors! So perhaps H(n,k) actually depends on k in some modestly interesting 2-adic way.

The sharpness argument above can be used to show that H(4m,2m) is as least

I was talking to Nigel Boston about this — he did some computations which make it looks like H(4m,2m) is *exactly* equal to (*) for m=1,2,3. Could that be true for general m?

(You could also ask about sets with no difference of weight a *multiple *of k; not sure which is the more interesting question…)

**Update**: Gil Kalai points out to me that much of this is very close to and indeed in some parts a special case of the Frankl-Wilson theorem… I will investigate further and report back!

]]>