Turning Concrete Facts into a Probability Distribution
Deriving a Non-Informative Distribution Over the Unit Interval
Problem statement
We have a real value π₯ that is known to lie in the unit interval [0 , 1] :
If this is all we know about π₯, what probability distribution should we assign it? This question is relevant to creating what is often called a non-informative prior distribution for a proportion, or for the parameter π of a Bernoulli or binomial distribution.
Iβll address this question in several steps:
First, create a finite premise πβ that expresses all the logical relationships between atomic propositions of the form (π‘ β€ π) or (π β€ π‘) for rational numbers π having absolute value at most π and denominator at most π.
Work out Pr(π β€ π‘ β€ π | πβ) for any rational π and π satisfying the above constraints, according to the EPL Theorem.
Take the limit as π β β .
We will find that we end up with a uniform distribution over the interval [0 , 1] .
Axiomatizing βat mostβ
Once again Iβll reiterate: (classical) propositional logic is a formal logic system, in which formulas such as (π‘ β€ 1/3) or (2/9 β€ π‘) are just unstructured propositional symbols without any built-in semantics. We have to include an axiomatization of the intended semantics in our premise. That axiomatization should should suffice to deduce facts such as
and
All of the atomic formulas we use will be of the form (π‘ β€ π) or (π β€ π‘) for some rational number π, using some canonical textual representation for rational numbers. In addition,
we use (π < π‘) as an abbreviation for Β¬ (π‘ β€ π) ;
we use (π‘ < π) as an abbreviation for Β¬ (π β€ π‘) ;
we use (π‘ = π) as an abbreviation for (π β€ π‘) β§ (π‘ β€ π) ;
we use (π β€ π‘ β€ π) as an abbreviation for (π β€ π‘) β§ (π‘ β€ π) , and similarly for variants using ` < β.
Then the following constitutes a complete axiomatization:1
where
That is,
π₯ is either at most π or at least π;
(π₯ β€ π) and (π < π) implies (π₯ < π) ; and
(π < π) and (π β€ π₯) implies (π < π₯).
A finite approximation
Unfortunately, π€ is an infinite set of propositional axioms; therefore we cannot AND them together to create a finite propositional formula. We will instead define a series πβ whose elements AND together successively larger subsets of π€, then take the limit as π β β.
For any integer π > 0, define πβ to be the set of rational numbers π such that
-π β€ π β€ π, and
π can be expressed as the ratio of integers π/π for 1 β€ π β€ π.
Note that πβ β πβ for all π β€ π, and every rational number belongs to some πβ.
Writing πβ for πβ β© [0 , 1], the elements of πβ lying between 0 and 1 inclusive, here are some examples:
πβ = { 0 , 1 } ,
πβ = { 0 , 1/2, 1 } ,
πβ = { 0 , 1/3, 1/2, 2/3, 1 } ,
πβ = { 0 , 1/4, 1/3, 1/2, 2/3, 3/4, 1 } ,
and so on.
Now we define a finite subset of the full axiomatization:
πβ is a finite approximation to our full set of facts about logical relations between different atomic propositions of the form (π β€ π‘) or (π‘ β€ π), limited to rational numbers π and π with a denominator of at most π (expressed in lowest terms) and an absolute value of at most π. πβ adds the constraint that π₯ lies in the unit interval.
Reductions for computing probabilities
It is straightforward to show that the following entailments hold for any π, π, π β πβ. (Recall that β¨π΄β, β¦ , π΄ββ© denotes a propositional formula meaning that exactly one of the π΄α΅’ is true.)
These are all to be expected; if πβ did not have these logical consequences, it would mean that it did not adequately axiomatize ββ€β on πβ. Using the above, and standard rules of probability theory, we can derive the following for π, π β πβ:
All of these are again as one would expect if we have properly axiomatized the meaning of ββ€β on πβ.
With these properties in hand, the probability of any interval with endpoints in πβ reduces to evaluating one of the following for some π β πβ:
Pr( π‘ β€ π | πβ ) or
Pr( π‘ = π | πβ ).
Computing upper-bound and point probabilities
πβ, the set of rational numbers between 0 and 1 inclusive with denominator at most π, is known as the Farey sequence of order π. (Yes, it is actually a set, but if you list its elements in ascending order it becomes a sequence.) Its properties were investigated in 1838 by the British Geologist John Farey, Jr. Two quantities associated with the Farey sequence are
π(π) : the number of elements in the Farey sequence πβ;
π(π, π) : the number of π β πβ for which π β€ π.
Note that π(π, π) = 0 for π < 0 and π(π, π) = π(π) for π β₯ 1.
The premise πβ has 2π(π) - 1 satisfying truth assignments: π(π) truth assignments each corresponding to a case where
for some π β πβ, and π(π) - 1 truth assignments each corresponding to a case where
for π and π consecutive elements of πβ.
Assume π β πβ. If π β₯ 0 , the formula (π‘ β€ π) β§ πβ likewise has 2π(π, π) - 1 satisfying truth assignments. Applying the EPL Theorem, this yields
which simplifies in one special case to
For negative π there are no satisfying truth assignments, and the EPL Theorem yields
The formula (π‘ = π) β§ πβ has one (1) satisfying truth assignment if 0 β€ π β€ 1 , zero (0) otherwise, and applying the EPL Theorem yields
where [π] is Knuth and Grahamβs notation for converting a Boolean value π to 0 or 1.
Take it to the limit
Let π* β πβ, πβ, β¦ be the infinite sequence of propositional formulas πβ , π β₯ 1. We may consider π* to represent our full state of information about π₯ and the properties of ββ€β. For any propositional formula π΄ we will define Pr( π΄ | π* ) to be the limiting probability as π β β , if the limit exists:
Now to compute some limits.
First of all, π(π) is asymptotically equal to 3πΒ²/πΒ², hence
and so
Next, taking the limits for the reductions previously presented has the effect of replacing πβ with π* and removing the constraint that π, π, π β πβ, since we can always find π sufficiently large that π, π, π β πβ for any π β₯ π. This yields the following: for any π, π, π β β,
These are all familiar properties of a continuous probability distribution over the real line, but derived from our ground axiomatization of ββ€β.
We can simplify the above further using Pr( π‘ = π | π* ) = 0 :
Finally, letβs evaluate the probability of (π‘ β€ π) in the limit as π β β.
For π β β, π < 0 or π β₯ 1, we have a trivial limit:
When 0 β€ π β€ 1, F. Dress proved2 that
Then
yielding
This is exactly the cumulative distribution function for the uniform distribution function on [0 , 1] ; thus, in the limit, we obtain a uniform distribution for π₯.
β is the set of rational numbers.
F. Dress (1999). βDiscrΓ©pance des suites de Farey.β J. ThΓ©or. Nombres Bordeaux 11, pp. 345β367.