Turning Concrete Facts into a Probability Distribution

Deriving a Non-Informative Distribution Over the Unit Interval

Feb 15, 2025

Problem statement

We have a real value 𝑥 that is known to lie in the unit interval [0 , 1] :

\((0\leq \mathtt{x})\land (\mathtt{x}\leq 1).\)

If this is all we know about 𝑥, what probability distribution should we assign it? This question is relevant to creating what is often called a non-informative prior distribution for a proportion, or for the parameter 𝜃 of a Bernoulli or binomial distribution.

I’ll address this question in several steps:

First, create a finite premise 𝑉ₙ that expresses all the logical relationships between atomic propositions of the form (𝚡 ≤ 𝑎) or (𝑎 ≤ 𝚡) for rational numbers 𝑎 having absolute value at most 𝑛 and denominator at most 𝑛.
Work out Pr(𝑎 ≤ 𝚡 ≤ 𝑏 | 𝑉ₙ) for any rational 𝑎 and 𝑏 satisfying the above constraints, according to the EPL Theorem.
Take the limit as 𝑛 → ∞ .

We will find that we end up with a uniform distribution over the interval [0 , 1] .

Axiomatizing “at most”

Once again I’ll reiterate: (classical) propositional logic is a formal logic system, in which formulas such as (𝚡 ≤ 1/3) or (2/9 ≤ 𝚡) are just unstructured propositional symbols without any built-in semantics. We have to include an axiomatization of the intended semantics in our premise. That axiomatization should should suffice to deduce facts such as

\(\left(2/3\leq \mathtt{x}\right)\rightarrow\left(1/2\leq \mathtt{x}\right)\)

and

\(\left(\mathtt{x}\leq1/4\right)\vee\left(1/4\leq \mathtt{x}\right).\)

All of the atomic formulas we use will be of the form (𝚡 ≤ 𝑎) or (𝑎 ≤ 𝚡) for some rational number 𝑎, using some canonical textual representation for rational numbers. In addition,

we use (𝑎 < 𝚡) as an abbreviation for ¬ (𝚡 ≤ 𝑎) ;
we use (𝚡 < 𝑏) as an abbreviation for ¬ (𝑏 ≤ 𝚡) ;
we use (𝚡 = 𝑎) as an abbreviation for (𝑎 ≤ 𝚡) ∧ (𝚡 ≤ 𝑎) ;
we use (𝑎 ≤ 𝚡 ≤ 𝑏) as an abbreviation for (𝑎 ≤ 𝚡) ∧ (𝚡 ≤ 𝑏) , and similarly for variants using ` < ’.

Then the following constitutes a complete axiomatization:1

\(\begin{align*} \Gamma & \triangleq\Gamma_{X}\cup\Gamma_{Y}\cup\Gamma_{Z}\\ \Gamma_{X} & \triangleq\left\{ X(a)\colon a\in\mathbb{Q}\right\} \\ \Gamma_{Y} & \triangleq\left\{ Y(a,b)\colon a,b\in\mathbb{Q},a<b\right\} \\ \Gamma_{Z} & \triangleq\left\{ Z(a,b)\colon a,b\in\mathbb{Q},a<b\right\} \end{align*}\)

where

\(\begin{align*} X(a) & \triangleq\left(\mathtt{x}\leq a\right)\vee\left(a\leq \mathtt{x}\right)\\ Y(a,b) & \triangleq\left(\mathtt{x}\leq a\right)\rightarrow\left(\mathtt{x}<b\right)\\ Z\left(a,b\right) & \triangleq\left(b\leq \mathtt{x}\right)\rightarrow\left(a<\mathtt{x}\right). \end{align*}\)

That is,

𝑥 is either at most 𝑎 or at least 𝑎;
(𝑥 ≤ 𝑎) and (𝑎 < 𝑏) implies (𝑥 < 𝑏) ; and
(𝑎 < 𝑏) and (𝑏 ≤ 𝑥) implies (𝑎 < 𝑥).

A finite approximation

Unfortunately, 𝛤 is an infinite set of propositional axioms; therefore we cannot AND them together to create a finite propositional formula. We will instead define a series 𝑊ₙ whose elements AND together successively larger subsets of 𝛤, then take the limit as 𝑛 → ∞.

For any integer 𝑛 > 0, define 𝑄ₙ to be the set of rational numbers 𝑎 such that

-𝑛 ≤ 𝑎 ≤ 𝑛, and
𝑎 can be expressed as the ratio of integers 𝑖/𝑗 for 1 ≤ 𝑗 ≤ 𝑛.

Note that 𝑄ₘ ⊆ 𝑄ₙ for all 𝑚 ≤ 𝑛, and every rational number belongs to some 𝑄ₙ.

Writing 𝓕ₙ for 𝑄ₙ ∩ [0 , 1], the elements of 𝑄ₙ lying between 0 and 1 inclusive, here are some examples:

𝓕₁ = { 0 , 1 } ,
𝓕₂ = { 0 , 1/2, 1 } ,
𝓕₃ = { 0 , 1/3, 1/2, 2/3, 1 } ,
𝓕₄ = { 0 , 1/4, 1/3, 1/2, 2/3, 3/4, 1 } ,

and so on.

Now we define a finite subset of the full axiomatization:

\(\begin{align*} W_{n} & \triangleq\bigwedge_{a\in Q_{n}}\left(X(a)\wedge\bigwedge_{\begin{subarray}{c} b\in Q_{n}\\ a<b \end{subarray}}\left(Y(a,b)\wedge Z(a,b)\right)\right)\\ V_{n} & \triangleq\left(0\leq \mathtt{x}\right)\wedge\left(\mathtt{x}\leq1\right)\wedge W_{n}. \end{align*}\)

𝑊ₙ is a finite approximation to our full set of facts about logical relations between different atomic propositions of the form (𝑎 ≤ 𝚡) or (𝚡 ≤ 𝑏), limited to rational numbers 𝑎 and 𝑏 with a denominator of at most 𝑛 (expressed in lowest terms) and an absolute value of at most 𝑛. 𝑉ₙ adds the constraint that 𝑥 lies in the unit interval.

Reductions for computing probabilities

It is straightforward to show that the following entailments hold for any 𝑎, 𝑏, 𝑐 ∈ 𝑄ₙ. (Recall that ⟨𝐴₁, … , 𝐴ₖ⟩ denotes a propositional formula meaning that exactly one of the 𝐴ᵢ is true.)

\(\begin{align*} V_{n} & \models\left\langle \left(\mathtt{x}<c\right),\,\left(\mathtt{x}=c\right),\,\left(c<\mathtt{x}\right)\right\rangle \\ V_{n} & \models\left(\mathtt{x}\leq c\right)\leftrightarrow\left(\left(\mathtt{x}<c\right)\lor\left(\mathtt{x}=c\right)\right)\\ V_{n} & \models\left(c\leq \mathtt{x}\right)\leftrightarrow\left(\left(c<\mathtt{x}\right)\lor\left(\mathtt{x}=c\right)\right)\\ V_{n} & \models\neg\left(a\leq \mathtt{x}<b\right)\quad\mbox{if }a>b\\ V_{n} & \models\neg\left(a<\mathtt{x}\leq b\right)\quad\mbox{if }a\geq b\\ V_{n} & \models\neg\left(a\leq \mathtt{x}<b\right)\quad\mbox{if }a\geq b\\ V_{n} & \models\neg\left(a<\mathtt{x}<b\right)\quad\mbox{if }a\geq b\\ V_{n} & \models\left(\left(\mathtt{x}<a\right)\lor\left(a\leq \mathtt{x}\leq b\right)\right)\leftrightarrow\left(\mathtt{x}\leq b\right)\quad\mbox{if }a\leq b. \end{align*}\)

These are all to be expected; if 𝑉ₙ did not have these logical consequences, it would mean that it did not adequately axiomatize ‘≤’ on 𝑄ₙ. Using the above, and standard rules of probability theory, we can derive the following for 𝑎, 𝑏 ∈ 𝑄ₙ:

\(\begin{align*} \Pr\left(x<b\mid V_{n}\right) & =\Pr\left(x\leq b\mid V_{n}\right)-\Pr\left(x=b\mid V_{n}\right)\\ \Pr\left(a<x\mid V_{n}\right) & =1-\Pr\left(x\leq a\mid V_{n}\right)\\ \Pr\left(a\leq x\mid V_{n}\right) & =\Pr\left(a<x\mid V_{n}\right)+\Pr\left(x=a\mid V_{n}\right)\\ \Pr\left(a\leq x\leq b\mid V_{n}\right) & =\begin{cases} 0 & \mbox{if }a>b\\ \Pr\left(x\leq b\mid V_{n}\right)-\Pr\left(x<a\mid V_{n}\right) & \mbox{if }a\leq b \end{cases}\\ \Pr\left(a<x\leq b\mid V_{n}\right) & =\begin{cases} 0 & \mbox{if }a\geq b\\ \Pr\left(x\leq b\mid V_{n}\right)-\Pr\left(x\leq a\mid V_{n}\right) & \mbox{if }a<b \end{cases}\\ \Pr\left(a\leq x<b\mid V_{n}\right) & =\begin{cases} 0 & \mbox{if }a\geq b\\ \Pr\left(x<b\mid V_{n}\right)-\Pr\left(x<a\mid V_{n}\right) & \mbox{if }a<b \end{cases}\\ \Pr\left(a<x<b\mid V_{n}\right) & =\begin{cases} 0 & \mbox{if }a\geq b\\ \Pr\left(x<b\mid V_{n}\right)-\Pr\left(x\leq a\mid V_{n}\right) & \mbox{if }a<b \end{cases} \end{align*}\)

All of these are again as one would expect if we have properly axiomatized the meaning of ‘≤’ on 𝑄ₙ.

With these properties in hand, the probability of any interval with endpoints in 𝑄ₙ reduces to evaluating one of the following for some 𝑏 ∈ 𝑄ₙ:

Pr( 𝚡 ≤ 𝑏 | 𝑉ₙ ) or
Pr( 𝚡 = 𝑏 | 𝑉ₙ ).

Computing upper-bound and point probabilities

𝓕ₙ, the set of rational numbers between 0 and 1 inclusive with denominator at most 𝑛, is known as the Farey sequence of order 𝑛. (Yes, it is actually a set, but if you list its elements in ascending order it becomes a sequence.) Its properties were investigated in 1838 by the British Geologist John Farey, Jr. Two quantities associated with the Farey sequence are

𝑁(𝑛) : the number of elements in the Farey sequence 𝓕ₙ;
𝑁(𝑛, 𝑏) : the number of 𝑐 ∈ 𝓕ₙ for which 𝑐 ≤ 𝑏.

Note that 𝑁(𝑛, 𝑏) = 0 for 𝑏 < 0 and 𝑁(𝑛, 𝑏) = 𝑁(𝑛) for 𝑏 ≥ 1.

The premise 𝑉ₙ has 2𝑁(𝑛) - 1 satisfying truth assignments: 𝑁(𝑛) truth assignments each corresponding to a case where

\(\mathtt{x}=c\)

for some 𝑐 ∈ 𝓕ₙ, and 𝑁(𝑛) - 1 truth assignments each corresponding to a case where

\(a<\mathtt{x}<b\)

for 𝑎 and 𝑏 consecutive elements of 𝓕ₙ.

Assume 𝑏 ∈ 𝓕ₙ. If 𝑏 ≥ 0 , the formula (𝚡 ≤ 𝑏) ∧ 𝑉ₙ likewise has 2𝑁(𝑛, 𝑏) - 1 satisfying truth assignments. Applying the EPL Theorem, this yields

\(\Pr\left(\mathtt{x}\leq b\mid V_{n}\right)=\frac{2N\left(n,b\right)-1}{2N\left(n\right)-1}\quad\mbox{if }b\geq0\)

which simplifies in one special case to

\(\Pr\left(\mathtt{x}\leq b\mid V_{n}\right)=1\quad\mbox{if }b\geq1.\)

For negative 𝑏 there are no satisfying truth assignments, and the EPL Theorem yields

\(\Pr\left(\mathtt{x}\leq b\mid V_{n}\right)=0\quad\mbox{if }b<0.\)

The formula (𝚡 = 𝑏) ∧ 𝑉ₙ has one (1) satisfying truth assignment if 0 ≤ 𝑏 ≤ 1 , zero (0) otherwise, and applying the EPL Theorem yields

\(\Pr\left(\mathtt{x}=b\mid V_{n}\right)=\frac{\left[0\leq b\leq1\right]}{2N(n)-1}\)

where [𝜑] is Knuth and Graham’s notation for converting a Boolean value 𝜑 to 0 or 1.

Take it to the limit

Let 𝑉* ≜ 𝑉₁, 𝑉₂, … be the infinite sequence of propositional formulas 𝑉ₙ , 𝑛 ≥ 1. We may consider 𝑉* to represent our full state of information about 𝑥 and the properties of ‘≤’. For any propositional formula 𝐴 we will define Pr( 𝐴 | 𝑉* ) to be the limiting probability as 𝑛 → ∞ , if the limit exists:

\(\Pr\left(A\mid V^*\right)\triangleq\lim_{n\to\infty}\Pr\left(A\mid V_{n}\right).\)

Now to compute some limits.

First of all, 𝑁(𝑛) is asymptotically equal to 3𝑛²/𝜋², hence

\(\frac{1}{2N(n)-1}=O\left(n^{-2}\right)\to0\quad\mbox{as }n\to\infty\)

and so

\(\Pr\left(\mathtt{x}=c\mid V^*\right)=0\quad\mbox{for }c\in\mathbb{Q}.\)

Next, taking the limits for the reductions previously presented has the effect of replacing 𝑉ₙ with 𝑉* and removing the constraint that 𝑎, 𝑏, 𝑐 ∈ 𝑄ₙ, since we can always find 𝑁 sufficiently large that 𝑎, 𝑏, 𝑐 ∈ 𝑄ₙ for any 𝑛 ≥ 𝑁. This yields the following: for any 𝑎, 𝑏, 𝑐 ∈ ℚ,

\(\begin{align*} \Pr\left(\mathtt{x}<b\mid V^*\right) & =\Pr\left(\mathtt{x}\leq b\mid V^*\right)-\Pr\left(\mathtt{x}=b\mid V^*\right)\\ \Pr\left(a<\mathtt{x}\mid V^*\right) & =1-\Pr\left(\mathtt{x}\leq a\mid V^*\right)\\ \Pr\left(a\leq \mathtt{x}\mid V^*\right) & =\Pr\left(a<\mathtt{x}\mid V^*\right)+\Pr\left(\mathtt{x}=a\mid V^*\right) \end{align*} \)

\(\begin{align*} \Pr\left(a\leq \mathtt{x}\leq b\mid V^*\right) & =\begin{cases} 0 & \mbox{if }a>b\\ \Pr\left(\mathtt{x}\leq b\mid V^*\right)-\Pr\left(\mathtt{x}<a\mid V^*\right) & \mbox{if }a\leq b \end{cases}\\ \Pr\left(a<\mathtt{x}\leq b\mid V^*\right) & =\begin{cases} 0 & \mbox{if }a\geq b\\ \Pr\left(\mathtt{x}\leq b\mid V^*\right)-\Pr\left(\mathtt{x}\leq a\mid V^*\right) & \mbox{if }a<b \end{cases}\\ \Pr\left(a\leq \mathtt{x}<b\mid V^*\right) & =\begin{cases} 0 & \mbox{if }a\geq b\\ \Pr\left(\mathtt{x}<b\mid V^*\right)-\Pr\left(\mathtt{x}<a\mid V^*\right) & \mbox{if }a<b \end{cases}\\ \Pr\left(a<\mathtt{x}<b\mid V^*\right) & =\begin{cases} 0 & \mbox{if }a\geq b\\ \Pr\left(\mathtt{x}<b\mid V^*\right)-\Pr\left(\mathtt{x}\leq a\mid V^*\right) & \mbox{if }a<b \end{cases} \end{align*}\)

These are all familiar properties of a continuous probability distribution over the real line, but derived from our ground axiomatization of ‘≤’.

We can simplify the above further using Pr( 𝚡 = 𝑐 | 𝑉* ) = 0 :

\(\begin{align*} \Pr\left(\mathtt{x}<b\mid V^*\right) & =\Pr\left(\mathtt{x}\leq b\mid V^*\right)\\ \Pr\left(a<\mathtt{x}\mid V^*\right) & =1-\Pr\left(\mathtt{x}\leq a\mid V^*\right)\\ \Pr\left(a\leq \mathtt{x}\mid V^*\right) & =\Pr\left(a<\mathtt{x}\mid V^*\right)\\ \Pr\left(a\leq \mathtt{x}\leq b\mid V^*\right) & =\begin{cases} 0 & \mbox{if }a>b\\ \Pr\left(\mathtt{x}\leq b\mid V^*\right)-\Pr\left(\mathtt{x}\leq a\mid V^*\right) & \mbox{if }a\leq b \end{cases}\\ \Pr\left(a<\mathtt{x}\leq b\mid V^*\right) & =\Pr\left(a\leq \mathtt{x}\leq b\mid V^*\right)\\ \Pr\left(a\leq \mathtt{x}<b\mid V^*\right) & =\Pr\left(a\leq \mathtt{x}\leq b\mid V^*\right)\\ \Pr\left(a<\mathtt{x}<b\mid V^*\right) & =\Pr\left(a\leq \mathtt{x}\leq b\mid V^*\right) \end{align*}\)

Finally, let’s evaluate the probability of (𝚡 ≤ 𝑏) in the limit as 𝑛 → ∞.

For 𝑏 ∈ ℚ, 𝑏 < 0 or 𝑏 ≥ 1, we have a trivial limit:

\(\begin{align*} \Pr\left(\mathtt{x}\leq b\mid V^*\right) & =\begin{cases} 0 & \mbox{if }b<0\\ 1 & \mbox{if }b\geq1. \end{cases} \end{align*}\)

When 0 ≤ 𝑏 ≤ 1, F. Dress proved2 that

\(\left|\frac{N\left(n,b\right)}{N(n)}-b\right|\leq\frac{1}{n}.\)

Then

\(\begin{align*} \frac{2N\left(n,b\right)-1}{2N\left(n\right)-1} & =\frac{N(n)}{N(n)-1/2}\cdot\frac{N\left(n,b\right)}{N\left(n\right)}-\frac{1}{2N(n)-1}\\ & =\left(1+O\left(n^{-2}\right)\right)\cdot\left(b+O\left(n^{-1}\right)\right)-O\left(n^{-2}\right)\\ & =b+O\left(n^{-1}\right)\\ & \to b\quad\mbox{as }n\to\infty, \end{align*}\)

yielding

\(\Pr\left(\mathtt{x}\leq b\mid V^*\right)=b\quad\mbox{if }0\leq b\leq1.\)

This is exactly the cumulative distribution function for the uniform distribution function on [0 , 1] ; thus, in the limit, we obtain a uniform distribution for 𝑥.

ℚ is the set of rational numbers.

F. Dress (1999). “Discrépance des suites de Farey.” J. Théor. Nombres Bordeaux 11, pp. 345–367.

Epistemic Probability

Discussion about this post