Intro
The EPL Theorem is the main result of my 2017 paper, From propositional logic to plausible reasoning: A uniqueness theorem. (The HTML version has some typos, so I recommend downloading the PDF or getting the preprint on arXiv.) “EPL” stands for “Extending Propositional Logic,” or maybe “Epistemic Probability from Logic.” (I’m not wedded to this name, so feel free to suggest a better one.)
Roughly speaking, the EPL Theorem states that the laws of probability are, up to isomorphism, the uniquely determined extension of classical propositional logic (CPL) to handle degrees of plausibility / credibility. It is similar in its aims to Cox's Theorem, but assumes only that we must have consistency with CPL and retain certain of CPL’s properties. It doesn’t assume that degrees of plausibility are real numbers between 0 and 1, nor even that they are totally ordered.
And, unlike Cox’s Theorem, it prescribes specific numeric values for Pr(A | X).
The plausibility function
In classical propositional logic we have the entailment relation 𝛤 ⊨ 𝐴, or “Γ entails A,” which means that propositional formula A is a logical consequence of the set of propositional formulas Γ, without relying on any particular interpretation of the propositional symbols used in Γ or A. As discussed earlier, we can restrict our attention to finite sets Γ and AND their elements together into a single formula, so we treat the entailment relation as a binary relation on propositional formulas. 𝑋 ⊨ 𝐴 then means that any truth assignment satisfying X also satisfies A.
Viewed another way, we can think of the entailment relation as defining a binary operation ( · | · ) on pairs of propositional formulas:
The restriction that X be satisfiable (not a contradiction) in the second clause above ensures that the definition is unambiguous, as X then cannot also entail A. Note that if 𝑋 is unsatisfiable then (A | X) = 𝖳; this choice is somewhat arbitrary but mathematically convenient.
We call X the premise, and we call A the query in the expression (A | X).
The EPL Theorem is about extending the definition of this binary operation, replacing “undefined” with specific plausibility values intermediate between 𝖳 (certainly true) and 𝖥 (certainly false). These plausibility values have at least a partial ordering, with greater values indicating a higher degree of plausibility than lesser values. We call this extended operation the plausibility function.
It is important to note that there is only one entailment relation, used for all problem domains. This relation is determined solely by the syntactic content of the premise and consequence; it makes no use of any intended semantics of the propositional symbols. We do not create new variants of the entailment relation for every problem domain.
Likewise, there is only one plausibility function, used for all problem domains. This function is determined solely by the syntactic content of the premise and query, making no use of any intended semantics of the propositional symbols. We do not create new variants of the plausibility function for every problem domain.
Requirements
Let’s look at what properties the plausibility function should have. Every one of these requirements is of the form, “the plausibility function should retain some property that the entailment relation already has.”
Invariance from logical equivalence
The most basic requirement one could place on the plausibility function is that it respect logical equivalence, just as the entailment relation does:
R1. If 𝑋 and 𝑌 are logically equivalent, then (A | X) = (A | Y); and if 𝐴 and 𝐵 are logically equivalent assuming 𝑋 (i.e. 𝑋 ⊨ 𝐴 ↔︎ 𝐵), then (𝐴 | 𝑋) = (𝐵 | 𝑋).
Preservation of existing distinctions in degree of plausibility
It turns out that CPL already comes with a built-in partial plausibility ordering of propositional formulas:
If 𝑋 ⊨ 𝐴 ↔︎ 𝐵 then 𝐴 and 𝐵 are equally plausible (assuming 𝑋), as they are equivalent propositional formulas.
If 𝑋 ⊨ 𝐴 → 𝐵 then 𝐵 is at least as plausible as 𝐴 (assuming 𝑋), since 𝐵 is true for any possible world (truth assignment satisfying 𝑋) for which 𝐴 is true.
If 𝑋 ⊨ 𝐴 → 𝐵, but not 𝑋 ⊨ 𝐵 → 𝐴, then 𝐵 is strictly more plausible than 𝐴, since there are possible worlds for which 𝐵 is true and 𝐴 is not, but not vice versa.
We call this the implication ordering. This motivates the next requirement (using the numbering used in the paper):
R4. The implication ordering is preserved: if 𝑋 ⊨ 𝐴 → 𝐵 but not 𝑋 ⊨ 𝐵 → 𝐴 then (𝐴 | 𝑋) is some plausibility value that is strictly less than (𝐵 | 𝑋).
Note that R1 already ensures that (𝐴 | 𝑋) = (𝐵 | 𝑋) when 𝑋 ⊨ 𝐴 ↔︎ 𝐵.
Invariance under definition of new symbols
It is common in mathematical proofs to define new symbols as abbreviations for complex expressions or formulas. The same may be done in propositional logic: introduce a new propositional symbol 𝑠 for some complex propositional formula 𝐸, by adding the definition 𝑠 ↔︎ 𝐸 to the premise. This does not affect the entailment relation as long as neither the premise nor conclusion mention 𝑠.
Specifically, the entailment relation has the property that
whenever 𝑠 appears in neither 𝑋 nor 𝐴. We require that the plausibility function have the corresponding property:
R2. Let 𝑠 be a propositional symbol not occurring in 𝑋, 𝐴, or 𝐸. Then
Invariance under addition of irrelevant information
Suppose that 𝑆 contains all the propositional symbols used to talk about a particular problem domain. A propositional formula 𝑌 containing no symbol from 𝑆 tells us nothing about this domain; it is irrelevant information. Adding 𝑌 to the premise does not affect the entailment relation, as long as both premise and conclusion use only propositional symbols from 𝑆:
whenever 𝑌 is both satisfiable and contains no propositional symbol used in 𝐴 or 𝑋. We require that the plausibility function have the corresponding property:
R3. Let 𝑌 be a satisfiable formula containing no propositional symbol used in either 𝑋 nor 𝐴. Then
The theorem
We can now state the theorem:
EPL Theorem. If R1–R4 hold, then:
There is an order-preserving isomorphism 𝑃 between ℙ (the set of all possible plausibility values) and ℚ ⋂ [0, 1] (the set of rational numbers between 0 and 1 inclusive).
𝑃(𝐴 | 𝑋), the plausibility (𝐴 | 𝑋) mapped via 𝑃 to the unit interval, is given by
\(\begin{eqnarray*} P(A \mid X) &=& \frac{\#_S(A \wedge X)}{\#_S(X)} \\[9pt] \#_S(Y) &\triangleq& \;\begin{array}{l} \mbox{number of truth assignments on }S\\ \mbox{that satisfy }Y \end{array} \end{eqnarray*}\)where 𝑆 is the set of all propositional symbols used in 𝐴 or 𝑋.
Hence the usual laws of probability hold:
0 ≤ 𝑃(𝐴 | 𝑋) ≤ 1.
𝑃(𝐴 | 𝑋) = 1 if 𝑋 ⊨ 𝐴.
𝑃(𝐴 | 𝑋) = 0 if 𝑋 ⊨ ¬𝐴 and 𝑋 is satisfiable.
𝑃(¬𝐴 | 𝑋) = 1 - 𝑃(𝐴 | 𝑋) if 𝑋 is satisfiable.
𝑃(𝐴 ∧ 𝐵 | 𝑋) = 𝑃(𝐵 | 𝑋) · 𝑃(𝐴 | 𝐵 ∧ 𝑋).
The order-preserving isomorphism 𝑃 maps plausibilities to probabilities. Pr(𝐴 | 𝑋), the probability of 𝐴 given 𝑋, is then just an alternative notation for 𝑃(𝐴 | 𝑋).
Here are some examples:
Pr(𝚊 ∨ 𝚋 | 𝚊 ↔︎ 𝚋) = 1/2. There are two ways (𝚊 ↔︎ 𝚋) can be true: { 𝚊 ↦ 𝖥, 𝚋 ↦ 𝖥 } and { 𝚊 ↦ 𝖳, 𝚋 ↦ 𝖳 }. The formula (𝚊 ∨ 𝚋) is true for only one of these: { 𝚊 ↦ 𝖳, 𝚋 ↦ 𝖳 }.
Define ⟨𝑠₁, …, 𝑠ₙ⟩ to be the following proposition formula meaning “exactly one of 𝑠₁, …, 𝑠ₙ is true”:
\(\left(\bigwedge_{i=1}^{n-1}\bigwedge_{j=i+1}^n\neg(s_i \wedge s_j)\right)\;\wedge\;\left(\bigvee_{i=1}^n s_i\right)\)Then Pr(𝚊𝟸 | ⟨ 𝚊𝟷, …, 𝚊𝟼 ⟩) = 1/6. There are six ways that ⟨ 𝚊𝟷, …, 𝚊𝟼 ⟩ can be true, and 𝚊𝟸 is true for one of them.
Great work, Kevin, on breaking down such a complex concept! The way you structured the foundational ideas of the EPL Theorem really clarifies the interplay between evidence and belief systems.
Would love to see a discussion on the more practical side of this, specifically in ML.