## The median integer sequence

Meaningless question of the day:

What do you think the asymptotics of a_n are, where a_n is the median of the nth terms of all integer sequences in the OEIS?

Google interviewers, feel free to use this.

Update:  Henry Cohn comes through in the comments!  The median sequence starts 1, 3, 7, 13….

## 6 thoughts on “The median integer sequence”

1. Allen Knutson says:

I think it’s fair to say that the sequences with any negative entries are neglible. Then, there exist sequences like Busy Beaver. So if you’d said “mean” then the non-answer would have been “faster than any computable function”. But median? I bet that most of the sequences on there are indeed computable, so I’m back at square one.

2. Dan Schmidt says:

What a great question. Intuitively, I feel like the Fibonacci sequence is probably right around 50th percentile among useful integer sequences in this respect, so I’m going to guess that it’s somewhere between O(n) and O(n^2).

3. Terence Tao says:

Random sampling should locate the median with pretty high accuracy (though I’m too lazy to go ahead and do this right now…)

4. Henry Cohn says:

I used the data available from http://oeis.org/stripped.gz to compute the median sequence. Specifically, the i-th term of this sequence is the median of the i-th terms of all the sequences for which at least i terms are listed in the file:

1, 3, 7, 13, 26, 42, 67, 96, 129, 148, 193, 226, 273, 264, 284, 283, 290, 272, 253, 222, 199, 191, 197, 203, 192, 176, 156, 132, 125, 132, 130, 131, 126, 110, 96, 82, 82, 89, 97, 92, 89, 82, 74.5, 64, 58, 55, 57, 57, 55, 50, …

This does not behave at all the way I expected. Presumably it’s biased against rapidly growing sequences, since they won’t have many terms listed. The first four terms are unbiased (few enough sequences end early that they couldn’t possibly change the answer), but after that it’s unclear. Here are the best upper bounds for the true median sequence that can be deduced from the available data:

1, 3, 7, 13, 27, 47, 82, 127, 210, 280, 462, 644, 998, 1209, 1719, 2543, 3947, 5089, 7044, 8623, 11476, 17052, 27607, 40231, 54953, 72037, 96097, 111918, 217223, 333793, 707296, 1328231, 3538919, 16276632, 763418870, …

5. Sam Alexander says:

I’m willing to bet the sequence would be computable. Allen Knutson is quite right that if we were talking about the *mean*, it would be completely swamped by Busy Beaver numbers and grow too fast to be computable. But since it’s the median… Of course, almost all sequences are non-computable (in a countability sense), but of the non-computable sequences which would actually appear in the OEIS, most would fall into three categories: super-fast growers, super-slow growers, or sequences from {0,1}. None of these should significantly influence the median.

6. JSE says:

AWESOME!