Mathematicians becoming data scientists: Should you? How to?

I was talking the other day with a former student at UW, Sarah Rich, who’s done degrees in both math and CS and then went off to Twitter.  I asked her:  so what would you say to a math Ph.D. student who was wondering whether they would like being a data scientist in the tech industry?  How would you know whether you might find that kind of work enjoyable?  And if you did decide to pursue it, what’s the strategy for making yourself a good job candidate?

Sarah exceeded my expectations by miles and wrote the following extremely informative and thorough tip sheet, which she’s given me permission to share.  Take it away, Sarah!

 

 

As I went back and edited my initial bullet points to Jordan, I realized I had a good deal to say on this topic. My bullet points started overflowing. So, the TL;DR (which I acknowledge is still pretty long) is in the bullet points. However, if you’re looking for a more Foster-Wallace-esque adventure, and also perhaps the most unique insights I have to offer on these questions, the footnotes are where it’s at.

 

Some bullet points to determine if you (a Math PhD student) would like this kind of work:

  • Do you like modeling complex problems mathematically? Do you enjoy taking a real-world problem and determining its salient features in the context of real-world constraints?
  • Do you like communicating with diverse audiences, many of whom are not mathematical experts? Are you good at translating mathematical insights into non- or less mathematical language?
  • Can you walk away from a problem when the solution is “good enough,” are you able to switch between tasks or problems with relative ease? Are you OK with simple solutions to problems that could have more complicated solutions, but only with rapidly diminishing returns?
  • Are you OK with abandoning the work of your dissertation and potentially not publishing in academic journals anymore? (Unless you are at Google or Microsoft Research the likelihood is that you will not have this opportunity. You may, however, be able to publish in Industry journals or the Industry tracks of interdisciplinary academic conferences). Are you OK with becoming more of a practitioner than a theorist?
  • Do you like programming in an object oriented, statically typed language and are you at least decent at it? This one really depends on the company and the role; at Twitter to have any impact at all you had to build it yourself, and that meant writing code in Scala; if you wanted to do anything REALLY interesting you had to be REALLY good at Scala, which is a serious investment. At other companies things can be different, but IMO the really interesting work requires knowing how to program at a level where you could pass as a Software Engineer. However there is this whole category of Data Scientist for which you only need to know R or Pandas (the R-like Python package) and be well-versed in Statistics. For that job you do mostly offline analysis of various business metrics; if you start building models of users in R you run into size constraints pretty quickly and then also “productionizing” (i.e., building a working system to update and implement your model) your approach needs to be taken up by someone else, which then takes you out of the driver’s seat in the chaotic environment of most tech companies. In conclusion, Math PhDs can run the risk of becoming irrelevant in tech if they cannot build things[1].

How to position oneself for these types of jobs (and by “these types of jobs” I mean tech jobs as a Data Scientist or ML engineer NOT at Google Research or Microsoft Research):

  • Have a coding project or two in java or C++ (or Scala, or Go) that has some real-world machine learning application. Pick a dataset, solve a problem, put it on your blog.
  • Have an R or Pandas data analysis project where you take some publicly available data and glean some insights. Clarity of communication and a pretty presentation are what matter here, more than being super impressive technically, although that’s also good. If you make your project in an iPython notebook and then put it on github, all the better.
  • Be ready to have a coding interview. Google-able resources to support you in your preparations abound. You can come up to speed 5-10 times faster than a typical CS undergrad (e.g. you won’t be struggling for weeks to grok summation notation) but this still does require some Ideally you’d start practicing regularly (like, 1-2 hours per day if you have minimal CS or algorithms background) 2-6 months before you’re on the job market. Look at it like a qualifying exam. Most places will let you interview in Python but it might also be a good idea to know how you’d answer questions in java or C++. Resenting or dreading this part of the process will not help you be successful at it, and it is arguably worth investing the time to be successful. It will at the very least give you many more employment options to choose from. Also this time investment can somewhat overlap with the time you put in to the bullet points above. A final note: if you are not already comfortable with java, I do not recommend the book “Cracking the Coding Interview.” Many people swear by it but I found it to be way too overwhelming as a first resource. What I did to get started was I coded the algorithms that are introduced in Kleinberg and Tardos’ Algorithm Design book (or at least the first 2/3 to ¾ of it); the book is small and well-written enough to not be completely impossible to get through in the time allotted, covers most of the data structures you might be asked to implement in an interview, and actually coding the algorithms presented was very good practice for coding interviews. Also, it will help you get good at analyzing run times of algorithms, which you’ll typically be asked to do as well. Combining this (or working through a similar book) with a good website which provides practice interview questions should be adequate preparation. The only hole here is that this doesn’t really help you understand object-oriented programming at all; finding a good internet resource that helps you understand the implementation of data structures as classes is probably advisable. It’s worth investing the time in a practical introduction to object oriented programming, but this is probably the least important of the things already mentioned (but may be very important when you actually start working, depending on how much production code you are actually expected to write).
  • Employers may opt, instead of the coding interview, to give you a take home coding project. You similarly need to prepare for these in much the same way you need to prepare for a coding interview; since these projects will typically have an expected turn around time, you should not wait until you have received one to prepare. The first two bullet points are also very good preparation for this type of task.
  • Be able to talk about what you do and what you’re interested in at various levels, to various audiences. A big thing about the transition to tech is that you possibly start communicating with people who don’t really know what a vector space is[2]. Be ready to have those conversations. Honestly ask yourself if you’re OK with having those conversations.
  • Research the industry and figure out what real-world problems you’d be interested in solving. Have a targeted job search that reflects that you already have an understanding of what’s out there and a path for your own career advancement in mind. Don’t see going to Industry as failing out of the academic career path; sometimes math grad students are viewed as seeing tech jobs as a sort of fallback that they’re entitled to if the academic path doesn’t work out. Try to pursue this chapter with the enthusiasm that you brought to your academic endeavors, and envision a path for yourself in this new context.

Obviously it is important to note that all opinions and advice expressed herein are just my opinion based on a rather limited array of experiences. I think the value added here comes from positioning (that is, I came from a pure math background originally), and my tendency to exhaustively and compulsively analyze and dissect any organization or situation I’m a part of.

[1] However, with regards to this bullet point, integrating the output of math PhDs into the workflow of a tech company is a known stumbling block and there’s a general consensus that excellent programming skills in addition to a rigorous mathematical background may be too much to ask of employees; there just simply aren’t enough people who have these qualifications. There’s a whole cottage industry that’s sprung up around translating the insight and modeling of data scientists into deployable code which is equal parts fascinating and wearying. These black-box analytics packages are part of a broader space of implemented solutions to abstractly defined problems called Enterprise Software where consumer companies begin to translate their problems into a form representable to a software solution and then receive the corresponding answers that are programmed in to that solution. As a mathematician, you may find you prefer the idea of working for an Enterprise or “B2B” company; you deal with abstracted problems and implement solutions perhaps according to some mathematically advanced modeling. You may have the opportunity to be rigorous and exacting in the development of this software and to make critical design choices in how information is aggregated, processed, and disseminated in its lifecycle through your system. You may be more directly on the critical path of the creation of the company’s final product than you would be at a company whose primary goal is not to manufacture analytics solutions. On the other hand, as a Data Scientist working for a company which is a consumer of an enterprise solution, you may bear witness to a phenomenon whereby employees translate their output into a format consumable by an enterprise solution and then are correspondingly shaped by that translation and its ripple effects on the implicit characterization of their work’s meaning and value*. This is happening everywhere all the time and can lead to the emergent phenomenon of a company which is really now just nothing more than the interaction of many dogs each being wagged by many tiny tails. Even more meta: as a certain brand of Data Scientist you may be tasked with the role of discerning a meaningful signal from the many outputs generated by various enterprise systems for tracking and aggregating data. It is actually this type of murky business problem that you are best suited to address and a good employer will seek your help in selecting metrics* which will correctly inform leadership about company progress. However, the opportunity of creating a role for a Data Scientist at this level is seldom recognized or appreciated. It is more likely your skills will be restricted to a more technical pursuit which may be ill-informed from a business standpoint in the first place; you will invest your time in generating a solution to a slightly or greatly misinterpreted and so misstated problem. But as long as you get to build ML models or deep learning systems or whatever thing it is that interests you, and get paid a lot to do it, that might not matter to you.

*a “metric” in business land is just a measurement, typically a thing that can be counted, or a percentage, often viewed over time. e.g. an IT department might have enterprise software for managing their tasks that reveals the number of open requests it has at any given time, the current number of resolved requests, or the resolved requests each day or week. It’s not entirely trivial to translate these metrics into a standard by which to evaluate the IT department’s performance; the company could be growing; an acquisition of another company may have led to a flurry of IT requests to get new employees on-boarded quickly while maintaining a desktop work environment similar to the one they had before, or something like that. What metric is flexible enough to reflect employee success in the diverse situations that arise in real life? This is an example of the not-entirely-trivial question around metrics to measure success that arises at all levels of a company all the time.

 

[2] Up until now, you have had the privilege of working for mathematicians all the way down (mostly). At the very least, your direct supervisor has been a mathematician whose mathematical insight and intuition you (hopefully) greatly respect; you have probably grown accustomed to working for someone who can not only understand but also analyze and evaluate your solution to a problem (indeed, in some cases they may have had to actively avoid thinking about your problem to avoid solving it before you). One of the biggest transitions when you move into Industry is that you will possibly move into a position where you are the local expert. This doesn’t just mean that you’re the expert on C*-algebras but someone else is the expert on etale cohomology (btw I have more or less forgotten what either of those things are; does this fact make you feel sad or empty? Your reaction is a good thing to note.); you will possibly be working for someone and with other people who never developed a real understanding of mathematical logic, who basically see you as some kind of computer-math sorcerer whose work they can’t begin to understand (and I’m talking here not about C*-algebras but about stochastic gradient descent or matrix multiplication or just matrices). The extent to which this is true obviously varies, but it’s a good idea to know ahead of time how much having mathematical colleagues is important to you, and also how much having a mathematically literate boss is important to you. Both are totally possible, but are by no means guaranteed. You might find that this is critical to your happiness on the job, so when comparing offers, don’t overlook it. The less your immediate supervisor can directly see the merits of your work, the more time you’ll have to spend explaining it and, more importantly, selling it. However, less ability on your supervisor’s part to oversee your work can also mean more creative freedom; when presented with a problem you may have total flexibility in determining the correct method of solution; if no one else in the room can understand what you’re doing you have a lot of control. But you also have a great deal more responsibility to consistently demonstrate your work’s value. There are tradeoffs, and different personalities thrive in each situation.

 

 

 

 

Tagged , , ,

10 thoughts on “Mathematicians becoming data scientists: Should you? How to?

  1. Tim Hopper says:

    I wrote a post about my own journey from studying math as an undergrad and math/operations research in grad school to being a data scientist. People may find it helpful as they try to figure out their own career paths.

    http://tdhopper.com/blog/2015/May/11/how-i-became-a-data-scientist/

  2. Jason says:

    This is really well-written and makes lots of good points. But I have to say, it doesn’t make what I would consider the main point. (I have a PhD in mathematics and worked in tech for a while, including at Google research.)

    The main question, at least to a pure math person like me, is: Will you feel content working numerically with mathematical objects, instead of proving things about them? This is a critical question about what one’s feeling of math is. You don’t have to be a snob to feel that mathematics, at its essence, means doing mathematical reasoning, which means doing mathematical arguments, otherwise known as proofs. (Although it’s also perfectly fine to feel differently.)

    To take a specific example: In my group at Google the language of vector spaces was everywhere (we were using vector word embeddings to do various natural language analyses). Everyone was very smart, and people were applying interesting algorithms and statistical analyses to the vector-oriented results they got. But, for example, there was probably no one there who could prove that every vector space has a basis. Or that two bases have the same number of elements. Or even the need for such a theorem in defining dimension.

    That doesn’t mean I was smarter than them. (I wasn’t!) Just that even basic proof-oriented mathematics from the beginning of linear algebra is irrelevant in doing cutting-edge tech work on embeddings in vector spaces. It’s not just that they weren’t technical people who didn’t know what a vector space was. It’s that that kind of abstraction, for the most part, just wasn’t relevant to their work. I think this should be considered when thinking about entering technology.

    [I also posted this on a reddit thread about this post.]

  3. JSE says:

    To Jason: from my own experience as a pure mathematician

    https://quomodocumque.wordpress.com/2016/01/15/messing-around-with-word2vec/

    I would say that working with word embeddings in vector spaces definitely feels like math — and you definitely have to know what a vector space IS in some deep sense, a deep sense which I believe someone could have without being able to reproduce a proof that a vector space has a basis.

    But I also think that Sarah was distinguishing Google and Microsoft Research from everyplace else; I think at most places, you wouldn’t be doing something that feels as much like math as word2vec feels.

  4. Junstin says:

    I think the list of companies where people publish regularly in top conferences/journals is much longer today than just Google research or Microsoft Research. From top of my head, research arms of the following companies publish regularly: Facebook AI Research, Nvidia, IBM, Adobe, Open AI, United Technologies Research, Mitsubishi Electric Research Baidu.

  5. Kevin says:

    I would also add that nothing prevents a pure math PhD from becoming a full-on Software Engineer. You just need to bring the attitude that the work is going to be a huge departure from what you did in grad school, but know that a lot of the same general skills apply.

    If I had to give one tip for practicing interviewing, it’s to try to get to the point where you can occasionally write full, syntactically-correct, accepted solutions to problems on this site in one try: https://leetcode.com/problemset/algorithms/. Also, read up on System Design; having a PhD means a lot of places won’t want to hire you at the lowest entry-level position, so they expect a bit of knowledge of distributed systems (for a start, maybe read about the CAP theorem on Wikipedia).

  6. Rachel Levy says:

    I’m wondering if any of you would be up for reposting or writing something new for the BIG Math Network: bigmathnetwork.wordpress.com ?

  7. […] Mathematicians becoming data scientists: Should you? How to? I was talking the other day with a former student at UW, Sarah Rich, who’s done degrees in both math and CS and then went off to Twitter. I asked her: so what would you say to a math Ph.D. student who was wondering whether they would like being a data scientist in the tech industry? How would you know whether you might find that kind of work enjoyable? And if you did decide to pursue it, what’s the strategy for making yourself a good job candidate? Sarah exceeded my expectations by miles and wrote the following extremely informative and thorough tip sheet, which she’s given me permission to share. Take it away, Sarah! […]

  8. psoares says:

    “””
    Have a coding project or two in java or C++ (or Scala, or Go) that has some real-world machine learning application. Pick a dataset, solve a problem, put it on your blog.
    “””

    I assume this advice is related to the action of “productionizing” your code. As someone interested in moving to industry after a physics PhD who has never worked in a tech company, I don’t really know what “productionizing” the code means. Can you provide an example of that? Say I wrote a recommender system in python, what would it mean to “productionize” it?

  9. jdbatson says:

    productionizing your code means making it ready to be used in a product.

    let’s say you wrote a recommender system in python, trained to recommend books to people based on a list of books they’ve enjoyed in the past. at the end you have some kind of predict function that can take in a list of titles and produce some ranked predictions.

    for this to be used at eg goodreads, you’ll need to

    1. update regularly. this will probably be some job running daily on a machine that takes in the new training data (including books published since the first time you made this) and creates weights for a new model.
    2. serve results in real time. the weights from the model need to be accessible in a process on a server that can respond (ultimately) to http requests.
    3. write tests to make sure it’s doing it’s job (both tests of the model itself and tests of the service serving it)
    4. add logging so when the thing breaks you can figure out why

  10. […] Mathematicians becoming data scientists: Should you? How to? Jordan Ellenberg, Quomodocumque blog […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: