Category Archives: friends

Mathematicians becoming data scientists: Should you? How to?

I was talking the other day with a former student at UW, Sarah Rich, who’s done degrees in both math and CS and then went off to Twitter.  I asked her:  so what would you say to a math Ph.D. student who was wondering whether they would like being a data scientist in the tech industry?  How would you know whether you might find that kind of work enjoyable?  And if you did decide to pursue it, what’s the strategy for making yourself a good job candidate?

Sarah exceeded my expectations by miles and wrote the following extremely informative and thorough tip sheet, which she’s given me permission to share.  Take it away, Sarah!



Continue reading

Tagged , , ,

People I saw

Another post for my own records, just to keep track of all the old friends and new acquaintances I was happy to see while traveling for How Not To Be Wrong.  Ordered roughly chronologically and from memory:

Paula and Jay Gitles, Aleeza Strubel, Daniel Biss, Stephen Burt, Jessie Bennett, Jay Pottharst, Bob and Donna Friedman, Vineeta Vijayaraghavan, Larry Hardesty, Moon Duchin, Mira Bernstein, Jerry and Cynthia and Rachel Frenkil, Audrey and Scott Zunick, Joe Schlam, Dick Gross, Noam Elkies, Ben and Elishe Wittes, Eric Walstein, Larry Washington, Manil Suri, Ivars Peterson, Tina Hsu, David Plotz, Josh Levin, Amy Eisner, Deane Yang, Michelle Shih, Warren Bass, Meredith Broussard, Jon Hanke, Tom Scocca, Cathy O’Neil, John Swansburg, Mike Pesca, Kardyhm Kelly, Charlie Jane Anders, Mimi Lipson, Annalee Newitz, Ken Katz, Jill Himmelfarb, The Invisible Cities, Patrick LaVictoire, Akshay Venkatesh, Ravi Vakil, Gary Antonick, David Carlton, Liesl Bross, Miranda Bross, Mark Lucianovic, Tom Church, Yuran Lu, Daniel Kane, Leslie Rappoport, Douglas Wolk, Derek Garton, Matt Haughey, Josh Millard, Brian LaMacchia, Lionel Levine, Ana Crossman (and her mom), Heather Evans (and her mom), Bianca Viray.


It was very social!  And sorry to the people I’ve inevitably skipped.


Ought there be a school just for math kids?

Proof School is a proposed San Francisco middle/high school (grades 7-12) which proposes to do three hours of higher math a day.



It seems certain this will be a great school, given that people like Ravi Vakil, Mira Bernstein and Richard Rusczyk are involved.

But I can’t help but be slightly put off by the presentation.  “We get math kids” is used as a kind of unifying slogan — in fact, it’s even trademarked!  (I hope my quoting it here does not require some form of license.)

I think it’s bad for us to carve out “math kid” as a kind of kid, separate from all others.  I think there ought to be an amazing school like the one Ravi and friends are building, but I don’t think it ought to be “just for math kids.”



Tagged ,

How much is the stacks project graph like a random graph?

Cathy posted some cool data yesterday coming from the new visualization features of the magnificent Stacks Project.  Summary:  you can make a directed graph whose vertices are the 10,445 tagged assertions in the Stacks Project, and whose edges are logical dependency.  So this graph (hopefully!) doesn’t have any directed cycles.  (Actually, Cathy tells me that the Stacks Project autovomits out any contribution that would create a logical cycle!  I wish LaTeX could do that.)

Given any assertion v, you can construct the subgraph G_v of vertices which are the terminus of a directed path starting at v.  And Cathy finds that if you plot the number of vertices and number of edges of each of these graphs, you get something that looks really, really close to a line.

Why is this so?  Does it suggest some underlying structure?  I tend to say no, or at least not much — my guess is that in some sense it is “expected” for graphs like this to have this sort of property.

Because I am trying to get strong at sage I coded some of this up this morning. One way to make a random directed graph with no cycles is as follows:  start with N edges, and a function f on natural numbers k that decays with k, and then connect vertex N to vertex N-k (if there is such a vertex) with probability f(k).  The decaying function f is supposed to mimic the fact that an assertion is presumably more likely to refer to something just before it than something “far away” (though of course the stack project is not a strictly linear thing like a book.)

Here’s how Cathy’s plot looks for a graph generated by N= 1000 and f(k) = (2/3)^k, which makes the mean out-degree 2 as suggested in Cathy’s post.


Pretty linear — though if you look closely you can see that there are really (at least) a couple of close-to-linear “strands” superimposed! At first I thought this was because I forgot to clear the plot before running the program, but no, this is the kind of thing that happens.

Is this because the distribution decays so fast, so that there are very few long-range edges? Here’s how the plot looks with f(k) = 1/k^2, a nice fat tail yielding many more long edges:


My guess: a random graph aficionado could prove that the plot stays very close to a line with high probability under a broad range of random graph models. But I don’t really know!

Update:  Although you know what must be happening here?  It’s not hard to check that in the models I’ve presented here, there’s a huge amount of overlap between the descendant graphs; in fact, a vertex is very likely to be connected all but c of the vertices below it for a suitable constant c.

I would guess the Stacks Project graph doesn’t have this property (though it would be interesting to hear from Cathy to what extent this is the case) and that in her scatterplot we are not measuring the same graph again and again.

It might be fun to consider a model where vertices are pairs of natural numbers and (m,n) is connected to (m-k,n-l) with probability f(k,l) for some suitable decay.  Under those circumstances, you’d have substantially less overlap between the descendant trees; do you still get the approximately linear relationship between edges and nodes?

Tagged , , , ,

Interview with DeMarco and Wilkinson

Nice joint interview with Laura DeMarco and Amie Wilkerson at Scientific American.

I didn’t know this about Amie:

 I went to college, and I was feeling very insecure about my abilities in mathematics, and I hadn’t gotten a lot of encouragement, and I wasn’t really sure this was what I wanted to do, so I didn’t apply to grad school. I came back home to Chicago, and I got a job as an actuary. I enjoyed my work, but I started to feel like there was a hole in my existence. There was something missing. I realized that suddenly my universe had become finite. Anything I had to learn for this job, I could learn eventually. I could easily see the limits of this job, and I realized that with math there were so many things I could imagine that I would never know. That’s why I wanted to go back and do math.

This was basically me, too.  After college I got into the fiction writing program at Johns Hopkins, which made me think maybe I could really make it as a writer, and I deferred grad school and moved to Baltimore and wrote fiction all day every day for a year, and while I valued that experience a lot, there was not a single day of it where I didn’t kind of wish I were doing math.  Having had that experience — not just suspecting but knowing how annoying it is not to be doing math — took the edge off the pain of the painful parts of grad school.

In which I have a quarter-million friends of friends on Facebook

One of the privacy options Facebook allows is “restrict to friends of friends.”  I was discussing with Tom Scocca the question of how many people this actually amounts to.  FB doesn’t seem to offer an easy way to get a definitive accounting, so I decided to use the new Facebook Graph Search to make a quick and dirty estimate.  If you ask it to show you all the friends of your friends, it just tells you that there are more than 1000, but doesn’t supply an exact number.  If you want a count, you have to ask it something more specific, like “How many friends of my friends are named Constance?”

In my case, the answer is 25.

So what does that mean?  Well, according to the amazing NameVoyager, between 100 and 300 babies per million are named Constance, at least in the birthdate range that contains most of Facebook’s user base and, I expect, most of my friends-of-friends (herafter, FoFs) as well.  So under the assumption that my FoFs are as likely as the average American to be named Constance, there should be between 85,000 and 250,000 FoFs.

That assumption is massively unlikely, of course; name choices have strong correlations with geography, ethnicity, and socioeconomic thingamabobs.  But you can just do this redundantly to get a sense of what’s going on.  59 of my FoFs are named Marianne, a name whose frequency ranges from 150-300 parts per million; that suggests a FoF range of about 200-400K.

I did this for a few names (50 Geralds, 18 Charitys (Charities??)) and the overlaps of the ranges seemed to hump at around 250,000, so that’s my vague estimate for the number.

Bu then I remembered that there was actually a paper about this on the arXiv, “The Anatomy of the Facebook Graph,” by Ugander, Karrer, Backstrom, and Marlow, which studies exactly this question.  They found something which is, to me, rather surprising; that the number of FoFs grows approximately linearly in the number of friends.  The appropriate coefficients have surely changed since 2011, but they get a good fit with

#FoF = 355(#friends) – 15057.

For me, with 680 friends, that’s 226,343.  Good fit!

This 2012 study from Pew (on which Marlow is also an author) studies a sample in which the respondents had a mean 245 Facebook friends, and finds that the mean number of FoFs was 156,569.  Interestingly, the linear model from the earlier paper gives only 72,000, though to my eye it looks like 245 is well within the range where the fit to the line is very good.

The math question this suggests:  in the various random-graph models that people like to use to study social networks, what is the mean size of the 2-neighborhood of x (i.e. the number of FoFs) conditional on x having degree k?  Is it ever linear in k, or approximately linear over some large range of k?

Tagged , , ,

Dan Sharfstein wins Guggenheim

Congratulations to Dan Sharfstein, who is one of this year’s Guggenheim Fellows!  I have written before about my admiration for Dan’s book The Invisible Line, and this seems a good occasion to say again — if you’re at all interested in the long, complicated history of race in America, buy the book and read it.  His new book will be about Oliver Otis Howard and the Freedmen’s Bureau.  This is the kind of project that requires long, deep research and painstaking thought.  I don’t know if we can Kickstarter things like this, and I’m glad we have the Guggenheim Foundation to help make them possible.


Tagged , ,

Distrusters of experts all around

The Wall Street Journal op-ed page is always good for a full-throated demand that we distrust the experts:

The general public is not privy to the IPCC debate. But I have been speaking to somebody who understands the issues: Nic Lewis. A semiretired successful financier from Bath, England, with a strong mathematics and physics background, Mr. Lewis has made significant contributions to the subject of climate change.

…Will the lead authors of the relevant chapter of the forthcoming IPCC scientific report acknowledge that the best observational evidence no longer supports the IPCC’s existing 2°-4.5°C “likely” range for climate sensitivity? Unfortunately, this seems unlikely—given the organization’s record of replacing evidence-based policy-making with policy-based evidence-making, as well as the reluctance of academic scientists to accept that what they have been maintaining for many years is wrong.

Domain knowledge, phooey — this dude is successful!

“Distrust the experts,” as a principle, does as much harm as good.  A better principle would be “Distrust people who are bad and trust people who are not bad.”  Of course, it can be hard to tell the difference — but that distinction is one we have to make anyway, in all kinds of contexts, so why not this one?


Tagged , ,

In defense of Nate Silver and experts

Cathy goes off on Nate Silver today, calling naive his account of well-meaning people saying false things because they’ve made math mistakes.  In Cathy’s view, people say false things because they’re not well-meaning and are trying to screw you — or, sometimes, because they’re well-meaning but their incentives are pointed at something other than accuracy.  Read the whole thing, it’s more complicated than this paraphrase suggests.

Cathy, a fan of and participant in mass movements, takes special exception to Silver saying:

This is neither the time nor the place for mass movements — this is the time for expert opinion. Once the experts (and I’m not one of them) have reached some kind of a consensus about what the best course of action is (and they haven’t yet), then figure out who is impeding that action for political or other disingenuous reasons and tackle them — do whatever you can to remove them from the playing field. But we’re not at that stage yet.

Cathy’s take:

…I have less faith in the experts than Nate Silver: I don’t want to trust the very people who got us into this mess, while benefitting from it, to also be in charge of cleaning it up. And, being part of the Occupy movement, I obviously think that this is the time for mass movements.

From my experience working first in finance at the hedge fund D.E. Shaw during the credit crisis and afterwards at the risk firm Riskmetrics, and my subsequent experience working in the internet advertising space (a wild west of unregulated personal information warehousing and sales) my conclusion is simple: Distrust the experts.

I think Cathy’s distrust is warranted, but I think Silver shares it.  The central concern of his chapter on weather prediction is the vast difference in accuracy between federal hurricane forecasters, whose only job is to get the hurricane track right, and TV meteorologists, whose very different incentive structure leads them to get the weather wrong on purpose.  He’s just as hard on political pundits and their terrible, terrible predictions, which are designed to be interesting, not correct.

Cathy wishes Silver would put more weight on this stuff, and she may be right, but it’s not fair to paint him as a naif who doesn’t know there’s more to life than math.  (For my full take on Silver’s book, see my review in the Globe.)

As for experts:  I think in many or even most cases deferring to people with extensive domain knowledge is a pretty good default.  Maybe this comes from seeing so many preprints by mathematicians, physicists, and economists flushed with confidence that they can do biology, sociology, and literary study (!) better than the biologists, sociologists, or scholars of literature.  Domain knowledge matters.  Marilyn vos Savant’s opinion about Wiles’s proof of Fermat doesn’t matter.

But what do you do with cases like finance, where the only people with deep domain knowledge are the ones whose incentive structure is socially suboptimal?  (Cathy would use saltier language here.)  I guess you have to count on mavericks like Cathy, who’ve developed the domain knowledge by working in the financial industry, but who are now separated from the incentives that bind the insiders.

But why do I trust what Cathy says about finance?

Because she’s an expert.

Is Cathy OK with this?

Tagged , , , , ,

Startup culture, VC culture, and Mazurblogging

Those of us outside Silicon Valley tend to think of it as a single entity — but venture capitalists and developers are not the same people and don’t have the same goals.  I learned about this from David Carlton’s blog post.  Cathy O’Neil reposted it this morning.  It’s kind of cool that the three of us, who started grad school together and worked with Barry Mazur, are all actively blogging!  We just need to get Matt Emerton in on it and then we’ll have the complete set.  Maybe we could even launch a new blogging platform and call it mazr.  You want startup culture, I’ll give you startup culture!


Tagged , , ,
%d bloggers like this: