The classical definition of probability
Let’s take a look at the formula the EPL Theorem gives us for computing probabilities:
This bears a striking resemblance to the classical definition of probability, as stated by Laplace in 1820:
The probability of an event is the ratio of the number of cases favorable to it, to the number of possible cases, when there is nothing to make us believe that one case should occur rather than any other, so that these cases are, for us, equally possible.
The difference is that this identity is now a theorem, not a definition.
The “possible cases” are just truth assignments satisfying 𝑋, and the “favorable cases” are truth assignments satisfying both 𝐴 and 𝑋. The arguably circular caveat “equally probable” is unnecessary and may be dropped. The phrase “there is nothing to make us believe that one case should occur rather than any other” means that we possess no additional information that, if conjoined with 𝑋, would expand the satisfying truth assignments by different multiplicities.
A subtle example of relevant information
Bertrand’s “Box Paradox” illustrates this last point. There are three boxes in a row, identical in appearance, each with two drawers. One of the boxes has gold coins in both drawers (GG); one has silver coins in both drawers (SS); and the remaining box has a gold coin in one drawer and a silver coin in the other (GS). You open the first drawer of the second box and see that it holds a gold coin; what is the probability that the other drawer also holds a gold coin?
A naïve analysis, using only information about the second box, gives a probability of 1/2, since the second box must be either the GG or GS box. But this ignores the information we have about the first and second boxes. Here’s a table with all the possibilities:
When we include this extra information, the case “second box is GS” gets expanded into two cases, while the case “second box is GG” gets expanded into four cases, yielding a probability of 4/6 = 2/3 that the second drawer contains a gold coin.
A simple derivation of the ratio formula
If you already accept the idea that epistemic probabilities are the right way to reason about degrees of plausibility, then there’s a simply way to derive the formula for Pr(𝐴 | 𝑋) given by the EPL Theorem.
First, let’s keep R1, the requirement that we can replace a propositional formula by any equivalent formula.
Let’s also assume invariance under consistent negation of the sense of a proposition. That is, if 𝑠 is a propositional symbol, and 𝐴ʹ and 𝑋ʹ are obtained from 𝐴 and 𝑋 by replacing all occurrences of 𝑠 with ¬𝑠, then Pr(𝐴ʹ | 𝑋ʹ) = Pr(𝐴 | 𝑋).
The motivation for this is that it is an arbitrary choice whether to give a name to a proposition or to its logical negation. For example, if we have query 𝐴 and premise 𝑋 that make use of a propositional symbol 𝚙 whose intended meaning is that some number 𝑛 is even, we can instead choose to use 𝚙 to mean that 𝑛 is odd and replace all occurrences of 𝚙 in both 𝐴 and 𝑋 with ¬𝚙.
These assumptions imply that, for example,
(recall the unconditional probability Pr(𝐴) just means Pr(𝐴 | 𝑇) for any tautology 𝑇). Since the four cases are mutually exclusive and exhaustive, each of these probabilities is then equal to 1/4.
More generally, for any finite set of distinct propositional symbols 𝑆 = { 𝑠₁, …, 𝑠ₙ } and minterms 𝑒 and 𝑒ʹ on 𝑆 we have Pr(𝑒) = Pr(𝑒ʹ) = 1/2ⁿ. (A minterm is a conjunction 𝑙₁, …, 𝑙ₙ where each 𝑙ᵢ is either 𝑠ᵢ or ¬𝑠ᵢ.) Any propositional formula 𝐴 on 𝑆 may be expanded as the disjunction (OR) of minterms, and so
The identity
then follows for any 𝐴 and 𝑋 constructed from the symbols in 𝑆, using the law for conditionals,
Latent propositional symbols
One concern about the EPL Theorem may be that it mandates a uniform distribution on the space of truth assignments satisfying the premise. But what other reasonable option is there? The premise contains all the information we are to use in determining the probabilities, and it gives us no information by which we could favor one satisfying truth assignment over another.
Yet non-uniform distributions are common in applications of probability theory, and one may ask where they come from. The “Box Paradox” illustrates one answer: via marginalization. A uniform distribution at the finest level of granularity can correspond to a nonuniform distribution at coarser levels obtained by considering only a subset of the propositional symbols, regarding the remainder as latent.
Let’s consider a simple example. Let 𝚡 be a binary outcome of interest, and let 𝚜𝟷, …, 𝚜𝟻 each indicate one of five possible latent (unobservable) states. Let 𝑋 be the propositional formula
where ⟨𝚜𝟷, …, 𝚜𝟻⟩ is a propositional formula stating that exactly one of 𝚜𝟷, …, 𝚜𝟻 is true. Then the EPL Theorem tells us that
yielding a non-uniform Bernoulli distribution for 𝚡.
Infinite domains
Because a premise 𝑋 contains only a finite number 𝑛 of propositional symbols, it can distinguish at most 2ⁿ distinct situations. But what if we want to deal with infinite domains with an infinite number of distinctions possible? Jaynes proposed a finite sets policy:1
It is very important to note that our consistency theorems have been established only for probabilities assigned on finite sets of propositions. In principle, every problem must start with such finite-set probabilities; extension to infinite sets is permitted only when this is the result of a well-defined and well-behaved limiting process from a finite set.
In the same vein he also wrote2
In probability theory, it appears that the only safe procedure known at present is to derive our results first by strict application of the rules of probability theory on finite sets of propositions; then, after the finite-set result is before us, observe how it behaves as the number of propositions increase indefinitely.
I am currently working on formalizing this process, and have some results that I have not written up and published yet. I define a metric space of premises with a distance function having the property that if 𝑋(1), 𝑋(2), … is a Cauchy sequence of premises, then the associated sequence of probabilities Pr(𝐴 | 𝑋(1)), Pr(𝐴 | 𝑋(2)), …, is also a Cauchy sequence (of real numbers), for any query 𝐴. We can then complete the metric space and define
Coming up next…
In the next post we’ll outline the proof for the EPL Theorem.
E. T. Jaynes, 2003, Probability Theory: The Logic of Science, Cambridge University Press, p. 43.
Ibid, p. 663.