Tag Archives: artificial intelligence

GPT-3 me, by the book

Not gonna lie, it has always been a dream of mine, a marker of being a Real Writer, to do a New York Times Book Review “By the Book” interview, and this summer I did it. Really fun — I got to praise a lot of writers I truly admire (including Janet Malcolm, who I sort of hoped would read it, and who died the day before the interview came out.) and hopefully helped generate some publicity for Shape.

I also recently got beta access to Open AI’s “GPT-3” text-generation engine. So I thought it would be fun to give it part of my interview and see what happened if I asked it to continue.

Here are some results. What’s bold is text I supplied to GPT-3 (along with the whole text of the interview previous to that line.) This uses the davinci engine, with temperature set to 0.7.

What kind of reader were you as a child? Which childhood books and authors stick with you most?

I read “The Little House on the Prairie” series as it came out, and I think I really liked them. I reread “The Hobbit” and “The Lord of the Rings” many times. I loved “The Hobbit” so much I would reread the whole thing and just skip the long chapters. The first adult book I read was “The Heart Is a Lonely Hunter” by Carson McCullers, when I was 10 or 11, because I loved the title so much. It was a little hard to get into because it was so slow, but I was so happy to be reading a book that was all about people being sad, because I was so sad!

What’s the best book you’ve ever received as a gift?

“The Go-Between,” by L. P. Hartley. I know it’s a cliché, but it’s so perfect.

If you could require the president to read one book, what would it be?

Note that GPT-3 has enough text to recognize this as a “By The Book” interview, because the two questions it generates are indeed on the standard list of questions that the NYTBR asks authors. “Temperature” is a measure of how constrained GPT-3 feels to pick the word it seems most likely at each stage. One cool thing about the interface is it gives you the probability distribution it drew from for each word. THe word “Heart” above, for instance, is one of the lowest-probability words in the above response; GPT-3 rated it the 11th most likely next word, with a probability of 0.8%. Top three were “Cat” (4.0%), “Aut” (1.7%), and “Fountain” (1.7%). Wait, “Aut”? Yeah — GPT-3 uses “tokens,” not words, and sometimes these are parts of words, or even just initial letters. I imagine “Aut” could have been the beginning of “Autobiography” or “Author.” Presumably running the algorithm at a lower temperature would have made it much more likely I’d have read something starting with “The Cat.” Having settled on “Heart,” the following word “is” came in with an 87% probability, far above the next most likely candidate, “of.”

In fact, it’s kind of cool how the OpenAI interface displays these probabilities, so I’m gonna screenshot and include:

Red means unlikely, green means likely — so you can see very clearly the red “Heart” and the green “Is.” Note that the standard questions from “By the Book” are almost all green once you get past the first word; you’re on a track the algorithm’s very confident about. “The Go-Between,” by L.P. Hartley is a real book, by the way, though not one I’d ever heard of, and not one I think my earlier responses could fairly be said to make you think I’d like. The answer itself, though, while it’s not my style, is very much “By the Book” style.

Here’s what you get at temperature 0:

What kind of reader were you as a child? Which childhood books and authors stick with you most?

I was a voracious reader, but I don’t remember any particular books. I remember the feeling of being in a room with a book. I remember the feeling of being in a room with a book and a window. I remember the feeling of being in a room with a book and a window and a tree outside the window. I remember the feeling of being in a room with a book and a window and a tree outside the window and a friend in the room. I remember the feeling of being in a room with a book and a window and a tree outside the window and a friend in the room and a dog in the room. I remember the feeling of being in a room with a book and a window and a tree outside the window and a friend in the room and a dog in the room and a cat in the room. I remember the feeling of being in a room with a book and a window and a tree outside the window and a friend in the room and a dog in the room and a cat in the room and a hamster in the room. I remember the feeling of being in a room with a book and a window and a tree outside the window and a friend in the room and a dog in the room and a cat in the room and a hamster in the room and a goldfish in the room. I remember the feeling of being in a room with a book and a window and a tree outside the window and a friend in the room and a dog in the room and a cat in the room and a hamster in the room and a goldfish in the room and a bird in the room. I remember the feeling of being in a room with a book and a window and a tree outside the window and a friend in the room and a dog in

This is pretty typical. If you pick the most likely “next word” every time, you get stuck repeating yourself, getting more and more certain that staying in the loop is the right thing to do because after all it’s what you’ve been doing! The only even moderately low-probability words here occur when a new animal is introduced to the room.

What about temperature 1, giving GPT-3 maximal license to draw less-likely next tokens? (Maybe temp 1 means literally draw from the imputed distribution? I’m not sure.)

What kind of reader were you as a child? Which childhood books and authors stick with you most?

Neil Postman recommended “Wind in the Willows,” which my mother read to me and I read at my mother’s knee, many many times, more times than I can count. I come back to it every year or so, and it’s still a great book. I also was nuts about “Blondie”; the single joke in the joke book never loses its appeal for me. I loved “Black Beauty” and several other Victorian kids’ classics with similar emotional power; I haven’t read them again yet because I never have any time. I can’t forget how much I loved it the first time I read it.

Anything you’d recommend to someone new to your books?

“Lovecraft Country,” without a doubt. Also my essay in the Yale series, “Mao II and Then Some.” I also think my Song of Ice and Fire “house style” recaps (here, here, here and piecemeal here); they’re the most comfortable thing I’ve written. Even my husband likes them, and he doesn’t care about the show or the books. And the recent article on swaddling.

Edited to make explicit that in-kind donations by the author also serve as p.r. I should have included them, which I wish I had done.

Well, this is the most fun at any rate. I love the idea that Neil Postman was recommending books to childhood me, and to my mom, I guess. This one actually feels to me the most like my own prose style, even as the things it says don’t quite make sense? Maybe there’s a tradeoff between style and sense. At any rate, there is no actual essay called “Mao II and Then Some,” but that’s 100% a title I would use, and DeLillo really is a writer I admire and didn’t mention in the piece. “Anything you’d recommend to someone new to your books?” is not one of NYTBR’s standard questions for this feature, but doesn’t it sound like one? Oh, here’s the color chart so you can see how much redder and more daring this output is.

Impressive work — and let me emphasize that I just did those three runs and that’s what I showed you, no cherry-picking of the best output. Not something that makes me feel easily impersonable, of course. But I didn’t give it that much of my writing….!

Tagged , , ,

Netflix Prize photo finish!

Two hours less than 30 days ago, the team of BellKor’s Pragmatic Chaos submitted the first entry to the Netflix Prize to exhibit a 10% improvement in performance over Netflix’s movie-recommendation algorithm.   That started the final clock for the competition — whoever’s ahead at 2:42 Eastern time today wins the $1 million prize.

One of the really interesting lessons of the competition is that blendings of many algorithms seem to work better than any single algorithm, even when there’s no principled reason to do the blend.  It’s sort of a “wisdom of crowds of computer programs” effect.  As you can imagine, once BellKor’s Pragmatic Chaos (itself a blend of algorithms from two teams that have been leading through most of the competition) crossed the 10% threshold, just about everybody else realized their best — probably only — chance was to work together.

As of yesterday afternoon, a team called The Ensemble,  made up of — well, I can’t really tell how many previously separate competitors, but a lot — has achieved a 10.09% improvement.  BellKor’s Pragmatic Chaos is at 10.08%.  A hundredth of a percentage point might determine who gets the million.   Wow.

Is BPC sitting on a slightly better algorithm they’re planning to submit at the buzzer for the win?  Check the leaderboard later this afternoon to find out.

Update: Double wow.  No announcement yet, but it looks like BPC and The Ensemble both made submissions in the last hour of the contest; BGC made it to 10.09% but Ensemble, four minutes before closing time, bumped up to 10.10%.

Re-update: I didn’t read the rules carefully enough.  It looks like there’s another dataset (‘test dataset”), distinct from the one generating the public scores (“quiz dataset”) and the final winner will be the program that does best on the test dataset.  So the shifts in the lead, exciting as they are, aren’t necessarily relevant to the contest; we’ve got two algorithms which are essentially identically good and it ought to be a coin-flip which one does better in the final judgment.

Tagged , , , , ,

MALBEC: Jerry Zhu, Michael Coen, how to say snake in gibbon

Jerry Zhu will give the  last MALBEC seminar of the year tomorrow (Wednesday) afternoon, at 4pm, in Van Vleck B102:

Jerry Zhu (UW, computer sciences)

HAMLET (Human, Animal, and Machine Learning: Experiment and Theory)

Machine learning studies the principles governing all learning systems. Human beings and animals are learning systems too, and can be explored using the same mathematical tools.  This approach has been fruitful in the last few decades with standard tools such as reinforcement learning, artificial neural networks, and non-parametric Bayesian statistics.  We bring the approach one step further with some latest tools in machine learning, and uncover new quantitative findings.  In this talk, I will present three examples: (1) Human semi-supervised learning. Consider a child learning animal names.  Dad occasionally points to an animal and says “Dog!” (labeled data). But mostly the child observes the world by herself without explicit feedback (unlabeled data).  We show that humans learn from both labeled and unlabeled data, and that a simple Gaussian Mixture Model trained using the EM algorithm provides a nice fit to human behaviors.  (2) Human active learning.  The child may ask “What’s that?”, i.e. actively selecting items to query the target labels.  We show that humans are able to perform good active learning, achieving fast exponential error convergence as predicted by machine learning theory.  In contrast, when passively given i.i.d. training data humans learn much slower (polynomial convergence), also predicted by learning theory.  (3) Monkey online learning.  Rhesus monkeys can learn a “target concept”, in the form of a certain shape or color.  What if the target concept keeps changing?  Adversarial online learning model provides a polynomial mistake bound.  Although monkeys perform worse than theory, anecdotal evidence suggests that they follow the concepts better than some graduate students. Finally, I will speculate on a few lessons learned in order to create better machine learning algorithms.

In the third MALBEC lecture, Michael Coen talked about his work on clustering; he asked me afterwards whether I thought the talk was “mathy enough” for the audience, which was funny, because I thought it was 100% math from start to finish!  Here’s a cartoon of the main idea.  When presented with a giant set of data points, one of the first things you might want to do is cluster it:  that is, partition the points into some disjoint collection of subsets, each one of which consists of points which resemble their clustermates more than they do the points in the other clusters.  You might, for instance, want to identify clusters among U.S. legislators, or images, or gene expression patterns. As is so often the case, Cosma Shalizi supplies a good, succinct introduction to the topic from a statistician’s perspective.

How do you know when your clustering algorithm is good?  Sometimes there’s a natural way to evaluate; if your algorithm for clustering legislators reliably separates Democrats from Republicans, you’re probably doing something right.  But with other data, you might not have any pre-existing classification that helps you gauge the reasonableness of your clustering.  Let’s say, for instance, you have lots of short recordings of a gibbon; maybe you think that rather than being scattered structurelessly around the space of 1-second sound clips, they fall into a small finite set of clusters, which you would naturally be tempted to call phonemes. You can run a clustering algorithm on the clips, and you’ll get an answer.  But is it meaningful?  It’s hard to tell without a population of clips which are classified in advance.  Unfortunately, there’s no corpus of written gibbon texts which you can ask gibbons to read aloud.  So you have to do something else.

The trick, as Coen observes, is to replace the difficult and not-very-well-defined question “Is clustering X good?” with the much more tractable question “Are clusterings X and Y similar?”  Coen presented a really nice, principled way of answering this latter question, which allows him to do something like the following:  given your set of audio clips, apply your clustering algorithm separately  to two random samples of 50% of the data points.  These two samples will overlap in around 25% of the data.  Now you can use Coen’s measure to compare the two clusterings induced on this 25% subsample.  If you do this a lot, and you always get two clusterings which are almost exactly the same in Coen’s sense, that’s a good sign that your clustering algorithm is actually capturing real features of the data.

So it turns out that gibbon utterances really do seem to be organized into phonemes.  (A cursory google search suggests that this contradicts conventional wisdom about primate vocalization — can any primatologists weigh in?)  Once you have this finding, and the ability to classify the sound clips, you can do some remarkable things:  for instance, you can look at what combinations of phonemes gibbons emit when a snake comes by.  It turns out that the vocalization elicited by a snake isn’t a consistent combination of phonemes, as it would be in a human language.  Rather, you can write down a finite state automaton, any one of whose outputs seems to be a legitimate gibbon word for “snake”!

Coen had a picture of the automaton on a slide, which is truly cool, but which he is keeping to himself until the paper’s published.  I promise to tell you exactly how to say “snake” in gibbon in a later post.

Tagged , , , , , , , , , ,

More MALBEC: Niyogi on geometry of data, Coen on abstract nonsense

Tuesday, April 21 — tomorrow! — brings the third lecture in the MALBEC series:  Michael Coen, of computer sciences and biostat, talks on “Toward Formalizing “Abstract Nonsense”,” in Computer Sciences 1221 at 4pm.  Here’s the abstract:

The idea of a category — a set of objects sharing common properties
— is a fundamental concept in many fields, including mathematics,
artificial intelligence, and cognitive and neuroscience.  Numerous
frameworks, for example, in machine learning and linguistics, rest
upon the simple presumption that categories are well-defined.  This is
slightly worrisome, as the many attempts formalizing categories have
met with equally many attempts shooting them down.

Instead of approaching this issue head on, I derive a robust theory of
“similarity,” from a biologically-inspired approach to perception in
animals.  The very idea of creating categories assumes some implicit
notion of similarity, but it is rarely examined in isolation.
However, doing so is worthwhile, and I demonstrate the theory’s
applicability to a variety of natural and artificial learning
problems.  Even when faced with Watanabe’s “Ugly Duckling” theorem or
Wolpert’s stingy cafeteria (serving the famous “No Free Lunch”
theorems), one can make significant progress toward formalizing a
theory of categories by examining their often unstated properties.

I demonstrate practical applications of this work in several domains,
including unsupervised machine learning, ensemble clustering, image
segmentation, human acquisition of language, and cognitive
neuroscience.

(Joint work with M.H.Ansari)

Delicious food will follow the talk, as if this delicious abstract isn’t enough!

On Friday,  Partha Niyogi gave a beautiful talk on “Geometry, Perception, and Learning.”  His work fits into a really exciting movement in data analysis that one might call “use the submanifold.”  Namely:  data is often given to you as a set of points in some massively high-dimensional space.  For instance, a set of images from a digital camera can be thought of as a sequence of points in R^N, where N is the number of pixels in your camera, a number in the millions, and the value of the Nth coordinate is the brightness of the Nth pixel.  A guy like Niyogi might want to train a computer to distinguish between pictures of horses and pictures of fish.  Now one way to do this is to try to draw a hyperplane across R^N with all the horses are on one side and all the fish on the other.  But the dimension of the space is so high that this is essentially impossible to do well.

But there’s another way — you can take the view that the N-dimensionality of the space of all images is an illusion, and that the images you might be interested in — for instance, some class of images including all horses and all fish — might lie on some submanifold of drastically smaller dimension.

If you believe that manifold is linear, you’re in business:   statisticians have tons of tools, essentially souped-up versions of linear regression, for fitting a linear subspace to a bunch of data.  But linearity is probably too much to ask for.  If you superimpose a picture of a trout on a picture of a walleye, you don’t get a picture of a fish; which is to say, the space of fish isn’t linear.

So it becomes crucial to figure out things about the mystery “fish manifold” from which all pictures of fish are sampled; what are its connected components, or more generally its homology?  What can we say about its curvature?  How well can we interpolate on it to generate novel fish-pictures from the ones in the input?  The work of Carlsson, Diaconis, Ghrist, etc. that I mentioned here is part of the same project.

And in some sense the work of Candes, Tao, and a million others on compressed sensing (well-explained on Terry’s blog) has a similar flavor.  For Niyogi, you have a bunch of given points in R^N and a mystery manifold which is supposed to contain, or at least be close to, those points.  In compressed sensing, the manifold is known — it’s just a union of low-dimensional linear subspaces parametrizing vectors which are sparse in a suitable basis — but the points are not!

Tagged , , , , , , , ,
%d bloggers like this: