Good answers to the last question! I think I perhaps put my thumb on the scale too much by naming a variable p.
Let me try another version in the form of a dialogue.
ME: Hey in that other room somebody flipped a fair coin. What would you say is the probability that it fell heads?
YOU: I would say it is 1/2.
ME: Now I’m going to give you some more information about the coin. A confederate of mine made a prediction about whether the coin would fall head or tails and he was correct. Now what would you say is the probability that it fell heads?
YOU: Now I have no idea, because I have no information about the propensity of your confederate to predict heads.
(Update: What if what you knew about the coin in advance was that it fell heads 99.99% of the time? Would you still be at ease saying you end up with no knowledge at all about the probability that the coin fell heads?) This is in fact what Joyce thinks you should say. White disagrees. But I think they both agree that it feels weird to say this, whether or not it’s correct.
Why would it not feel weird? I think Qiaochu’s comment in the previous thread gives a clue. He writes:
Re: the update, no, I don’t think that’s strange. You gave me some weird information and I conditioned on it. Conditioning on things changes my subjective probabilities, and conditioning on weird things changes my subjective probabilities in weird ways.
In other words, he takes it for granted that what you are supposed to do is condition on new information. Which is obviously what you should do in any context where you’re dealing with mathematical probability satisfying the usual axioms. Are we in such a context here? I certainly don’t mean “you have no information about Coin 2” to mean “Coin 2 falls heads with probability p where p is drawn from the uniform distribution (or Jeffreys, or any other specified distribution, thanks Ben W.) on [0,1]” — if I meant that, there could be no controversy!
I think as mathematicians we are very used to thinking that probability as we know it is what we mean when we talk about uncertainty. Or, to the extent we think we’re talking about something other than probability, we are wrong to think so. Lots of philosophers take this view. I’m not sure it’s wrong. But I’m also not sure it’s right. And whether it’s wrong or right, I think it’s kind of weird.
(Of course, if we were talking about mathematical probability, the answer the question can’t be “My subjective probability that Coin 1 landed heads is p,” because you don’t know what p is! The answer is “the expected value of p under the distribution on p that I insist you have mentally specified.”)
Of course you don’t mean a uniform distribution. You mean a Jeffreys distribution.
Point taken, but I don’t mean that either! (Or at least: I don’t intend to mean that and part of what’s under discussion is whether it’s possible not to mean that, or rationally permissible not to mean that, or whatevs.)
Maybe this is only confusing to Bayesians? Going back to the original version, from a frequentist point of view, it makes perfect sense and is intuitive: you’re throwing away the trials where the coins don’t match, so of course you no longer ascribe a probability of one half to the fair coin. Its hard for me to see how this is any more than a simple exercise in conditional probability.
OK.
So was it similarly weird for people to have talked about the branching ratio of the Higgs (the relative probability for the Higgs to decay into various different final states) as a (somewhat complicated) function of the Higgs mass, BEFORE they knew what the Higgs mass was?
How is the unknown parameter (m_Higgs) which entered into THAT probability distribution different from the unknown parameter (p) which enters into THIS one?
Does NOT giving the parameter a name change how weird (or not weird) it is to talk about various probabilities being functions of that parameter?
Or is the “weird” part just that you are trying to interpret this in Bayesian terms?
As Aaron says, there’s a frequentist version of the problem (repeat the game many times, keeping only those coin-tosses where the two coins agree), which seems to make perfect sense. And the frequentist probability that, in those trials, coin1 comes up heads is surely p.
Wouldn’t it be “weirder” if the Bayesian version of the problem came up with a DIFFERENT answer?
Answer is still 1/2, I’d say. Your confederate may be biased towards heads or tails, but with no reason to suspect one over the other, every possibility that’s in some way biased towards heads will be counterbalanced by the reverse possibility which is equally biased in the other direction. So while it’s not entirely clear how to perform updates on your confederate’s guesses — you do need some prior — under any prior I’d consider reasonable (i.e., that satisfies the reversal property above) your overall subjective probability of your confederate stating heads will still be 1/2, which is all that’s relevant to this particular problem.
It seems to me that the problem is really just some fancy window dressing on: “Someone flips a coin with unknown probability of heads. What is the probability of heads?”
Regarding the update in the blogpost: you may even stipulate that either the coin has two sides that represent heads (bias 1 for heads) or it has two sides that represent tails (bias 0 for heads) and both options are initially taken to be equiprobable.
– On one reading, the probability for heads is either 1 or 0 (so definitely different from 1/2) (generating probability or objective bias);
– on another reading it is 1/2 (subjective probability).
The answer 1/2 can be obtained by collapsing the ‘higher-order probabilities’ to an overall subjective probability. The subjective probability(bias-probability is 1)=1/2=subjective probability(bias-probability is 0), so we obtain 1/2*1+1/2*0 = 1/2. (And likewise for any symmetrical distribution of probabilities over possible bias values.)
In the example, after the update on the new information, the new subjective probability is still 1/2. If someone finds this weird, I would clarify the different kinds of probability at play here (as above).
It may be helpful to consult Laplace on this (Essay, chapter 7)
He also explains that more interesting things may happen when you ask, e.g., about the probability of two consecutive heads (see bottom of p. 34 – top of p. 35).
Why do we assume that we always mean exactly the same thing (whatever that is) when we say ‘probability’? Is it because identical twins are harder to tell apart than other siblings?
As a non mathematician (did make it through calculus needed for undergraduate engineering) am wondering if some one could explain the question better. Has been bothering me for days (and have tried solving it using the Google.)
One coin is stated to be a fair coin which by definition means it has a 1/2 chance of landing heads. In any formulation of the question, what changes that? And in either way of stating question, what information do I have with which to draw any conclusion about either the second coin (except that its heads probability is something other than 1/2) or the coin toss prediction?
Please let me know what, if anything, is wrong with this analysis:
Let q = Pr(coin1 = H) = 0.5, and let p be the unknown Pr(coin 2 = H). If the coin flips are independent (which is maybe where this all goes off the rails), then the probabilities of the four possible outcomes are:
Pr(HH) = qp
Pr(HT) = q(1-p)
Pr(TH) = (1-q)p
Pr(TT) = (1-q)(1-p)
So, the probability that the two coins are the same is Pr(HH) + Pr(TT) = qp + (1-q)(1-p). Because q = 0.5, this is just 0.5p + 0.5(1-p) = 0.5 = q. All of which is to say that it looks like the fairness of coin 1 makes the value of p irrelevant to the probability in question.
If q isn’t equal to 1-q, things don’t cancel so nicely, and we end up with the unknown quantity p sticking around. For example, if q = 0.3, we get 0.3p + 0.7(1-p) = 0.7 – 0.4p = ?.
You also need to know the propensity of the predictor to be correct in his forecasts. The flips might not be independent of the predictions, if Predictor knows something about the coin flipping mechanism, or can influence it, or has access to some information about its results.