A few months ago I posted a puzzle about aggregating probability estimates from different sources, and in particular how to aggregate opinions about the independence of two events.

I think I now understand the story slightly better. I am essentially going to agree with what Terry T. said in the comments to the first post (**this is my surprised face**) but at the same time try to dissolve my initial resistance to talking about second-order probabilities (statements of the form “the probability that the probability is p is q….”)

To save you a click, the question amounts to: if half of your advisors tell you that X and Y are independent coins with probability .9 of landing heads, and the other half of your advisors agree the coins are independent but say that the probability of heads is .1 for each, what should your degree of belief in X, Y, and X&Y be? And should you believe that X and Y are independent events, a fact about which your advisors are unanimous?

The answer depends, at least in part, on what you mean by “probability” and “independence.”

On one account, probability is a number between 0 and 1 that represents your degree of belief in a hypothesis, and independence of X and Y means that Pr(X&Y) = Pr(X)Pr(Y). Both are assertions about your mental state. So there’s no reason that the unanimity of your advisors about the independence of X and Y should make *you* believe that X and Y are independent; why should this aspect of their mental state automatically be taken to be a guide to yours? Relevant comparison: what if each advisor said “I am *really sure* my belief about the coin is correct.” Since all your advisors agree that the nature of the coin is very strongly certain, should you agree about that too? No — given that half your advisors think the coin is very likely to fall heads and half that it is very likely to fall heads, you are reasonably pretty *unsure* about the nature of the coin. Moreover, if X falls heads, you should rationally increase your degree of belief that Y will fall heads too, because X falling heads is evidence that the 0.9 gang is correct in their beliefs. So (for you, even if not for your advisors) the two events are *not *independent.

There is another account, in which the probability is an intrinsic property of the coin. On this account, it makes sense to talk about second-order probabilities: to say, for instance, that the probability that “the probability that the coin falls heads is .9” is 1/2. On this account, we can talk (as Terry does) about *conditional* independence; we say that there is an unknown parameter p which measures the propensity of the coin to fall heads, and that the condition Pr(X&Y) = P(X)P(Y) for independence only makes sense once P(X) and P(Y) are known.

In fact, I’ve come to favor the second view, at least as regards coins. Because here’s the thing. Let’s say I start with the first view. I have in mind a degree of belief that the first coin will fall heads, and I call this P(X). Given the evidence I have, probably P(X) should be 0.5. But once I’m forming degrees of belief, I must also have a degree of belief that a sequence of k tosses of the coin will all fall heads. And this should be the average of (0.9)^k and (0.1)^k, not (0.5)^k!

Having in mind the probability distributions on “number of heads in k tosses” for all k is, by De Finetti’s theorem, more or less the same as having in mind a probability distribution on the propensity of the coin to fall heads. That is, if a binary event is one we can imagine repeating, then our subjective degrees of belief about the event *automatically* have the structure of a second-order probability distribution on (Bernoulli) probability distributions. In fact, I think this was why De Finetti proved De Finetti’s theorem. In this context, independence is an intrinsic fact about the coins, not about our knowledge, and we should agree with our advisors that the coins are independent.

I’m less sure this story applies to uncertain events which are, by their nature, unrepeatable. What do we mean when we talk about the probability that Ankylosaurus had feathers? Is it meaningful in this context to say “I think there’s a 50% chance that there’s a 90% chance Ankylosaurus had feathers, and a 50% chance that there’s only a 10% chance” or is this exactly the same as saying you think there’s a 50% chance?

You could say “we don’t know whether any of the members of order Ornithischia were feathered, but if some were, then probably most were. Therefore there are two main possibilities, and if the first one holds, then Ankylosaurus probably had feathers; but if the second one holds, then it probably didn’t.”

Instead of many coins, the meta-uncertainty here concerns the conditional probabilities that, say, Stegosaurus is feathered given that Ankylosaurus is.

I think it is important to carefully distinguish between actual physical reality, and the mathematical model that we use to approximate that reality (this is what philosophers call the map-territory relation. A mathematical model contains both internal components (e.g. first-order probabilities, second-order probabilities, etc.), and external components (predictions about what one should observe in the real world). There are then at least three conflicting, but desirable features of a mathematical model. One is external accuracy: that its predictions match up closely with what actually happens in the real world. The other is internal accuracy: that the internal components of the model correspond closely with the underlying laws of nature that govern reality. The third is ease of computation: that the model is simple and tractable enough that one can actually do the maths and make meaningful and reliable predictions.

While all three of these features are desirable, it is a basic fact of mathematical modeling that one usually cannot have all three of them at once, and so something has to give. When one is using models for practical purposes (as opposed to theoretical or philosophical ones), one usually shoots for external accuracy and ease of computation, at the expense of internal accuracy (the “shut up and compute” paradigm). With this mindset, it becomes perfectly acceptable to introduce some mathematically convenient constructs into the model (such as first or second-order probabilities) in order to create one which models one’s current state of knowledge about reality to the best of one’s ability, even if such constructs have no actual physical meaning. In particular, questions such as “what does this probability really _mean_?” become somewhat beside the point.

As for your Anklyosaurus example, if the only purpose of one’s model was to obtain a prediction as to whether Anklyosaurus had feathers or not, then the two models you suggest are completely equivalent from an external perspective, and almost completely equivalent from a ease of computation perspective, even if they differ from an internal perspective. But if this model was part of a much larger model concerning which species had which characteristics, and how the species are all related to each other, then the two larger models that correspond to the respective Anklyosaurus models could be quite different from all three perspectives.

I agree that “What does this probability really mean?” become somewhat beside the point if you are asking a practical question — but, as you say, if you care about the philosophy that question is not beside the point at all! Or even if you don’t care about philosophy (I do, though!) you might care about psychology — what is your brain talking about when it talks about probability?

Re Ankylosaurus — I’m not so much interested in practical methods (since I don’t care whether Ankylosaurus had feathers or not) as I am in understanding the apparent fact, unclear to me before, that my intuition uses the word “probability” differently in different cases. So I’m trying to understand what kind of uncertainties invoke what kinds of probability. Both you and orthonormal are pointing to the fact that, while “Did Ankylosaurus have feathers” doesn’t refer to a repeatable event, it refers to a kind of intermediate case where there are lots of _similar_ questions about other kinds of dinosaurs and their texture. And you might have a second-order story where you attach probabilities to various theories about dinosaur texture, and then the probability of a feathered Ankylosaurus is best understood as conditional on the theory.

So “coin flip” is clearly a repeatable case; “does the deity of the Old Testament exist” is (I think) clearly a non-repeatable case; and “did Ankylosaurus have feathers” is somewhere in between.

[…] Quomodocumque reflects on probalitities of probabilities. […]

The standard (in philosophy) “Dutch book” approach of using “At what odds would you be willing to place a bet?” to define probability handles one-off probabilities ok, but it would be nice to see a treatment more along the lines of Lewis’s counterfactuals, where the probability of a one-off statement would turn into a standard probability question over the space of the plurality of worlds. (I personally think the plurality of worlds exists only in the world-modeling capability of the mind that can imagine these other worlds, but the treatment would surely make sense either way.)

Your previous post on Elga’s “subjective probabilities should be sharp” paper led me to wonder whether it is ever advantageous to represent something as a distribution over distributions over distributions, or perhaps an even longer chain… and reassuringly, of course it can be. (I even recall writing a rant about Elga’s paper, but I can’t find it now….)