Generalized Premises, Part 1

Formalizing Jaynes' Finite Sets Policy

Kevin S. Van Horn

May 27, 2025

An extended version of this article containing all proofs:

Generalized Premises Part 1

316KB ∙ PDF file

Download

Introduction

Recall that the EPL Theorem tells us that

\(\Pr\left(A\mid X\right)=\frac{\#_{S}\left(A\wedge X\right)}{\#_{S}\left(X\right)}\)

where

𝐴 (the query) and 𝑋 (the premise) are propositional formulas;
𝑋 is satisfiable;
𝑆 ⊆ 𝛴1 is any finite set of propositional symbols that includes all those occurring in 𝐴 or 𝑋; and
#_𝑆(𝐵), for a propositional formula 𝐵, is the number of truth assignments on 𝑆 for which 𝐵 evaluates true.

Propositional formulas are finite expressions, so the premise 𝑋 can only mention a finite number of propositional symbols, which limits us to finite domains, or at least to a finite number of distinctions on the problem domain. In the articles Turning Concrete Facts into a Probability Distribution (TCFPD) and Latent Symbols and Converging Approximations (LSCA) we saw examples of how an infinite domain, with infinite numbers of distinctions, can be approximated arbitrarily closely by a sequence of finite premises (𝑋ᵢ). As 𝑛 gets larger, 𝑋ₙ has information about an ever-greater number of propositional symbols, and for any query 𝐴 of interest, 𝘗𝘳(𝐴 | 𝑋ₙ) converges to a limiting value.

These examples followed E. T. Jaynes’ finite sets policy:

Apply the ordinary processes of arithmetic and analysis only to expressions with a finite number of terms. Then after the calculation is done, observe how the resulting finite expressions behave as the number of terms increases indefinitely.
In laying down this rule of conduct, we are only following the policy that mathematicians from Archimedes to Gauss have considered clearly necessary for nonsense avoidance in all of mathematics.2

In this article we formalize the process Jaynes describes.

Queries, premises, and latent symbols

We saw in LSCA that latent symbols—propositional symbols that appear only in the premise, but never in the queries, the equivalent of latent variables in statistical modeling—can be useful in expressing our information about a domain of interest. So we begin by partitioning the countably infinite set 𝛴 of propositional symbols into two parts, each of them also countably infinite:

ℒ, the set of latent symbols that are used only for defining premises, and
ℳ, the set of manifest symbols.

Queries are constructed using only manifest symbols; premises may be constructed using both latent or manifest symbols. Since they partition 𝛴, the manifest symbols ℳ and latent symbols ℒ are of course disjoint (no overlap), and their union is 𝛴.

With this in mind, we can now give a formal definition of “premise” and “query”:

Definition. A query is a propositional formula constructed using only manifest symbols. A premise is any satisfiable propositional formula. Write 𝛷(𝑆) for the set of all propositional formulas constructed from the symbols in 𝑆 ⊆ 𝛴, and 𝛷⁺(𝑆) for the set of all satisfiable members of 𝛷(𝑆); then 𝛷(ℳ) is the set of all queries, and 𝛷⁺(𝛴) is the set of all premises.

Equivalent premises

We wish to consider two premises to be equivalent if they yield the same probabilities, i.e., if 𝘗𝘳(𝐴 | 𝑋) = 𝘗𝘳(𝐴 | 𝑌) for all queries 𝐴. Clearly, if 𝑋 ≡ 𝑌 (𝑋 and 𝑌 are logically equivalent) then 𝑋 and 𝑌 are equivalent premises, but logical equivalence is not necessary for premise equivalence. For example, if

\(\begin{align*} X & \triangleq\left(m\rightarrow l_{1}\right)\\ Y & \triangleq\left(m\rightarrow l_{2}\land l_{3}\right)\wedge\left(\neg m\rightarrow l_{2}\right) \end{align*}\)

where 𝑚 ∈ ℳ and 𝑙₁, 𝑙₂, 𝑙₃ ∈ ℒ, then 𝑋 and 𝑌 are equivalent premises:

\(\Pr\left(m\mid X\right)=1/3=\Pr\left(m\mid Y\right);\)

and since 𝑚 is the only manifest symbol appearing in 𝑋 or 𝑌this implies that 𝘗𝘳(𝐴 | 𝑋) = 𝘗𝘳(𝐴 | 𝑌) for any query 𝐴.

A pseudo-metric on premises: first try

We also want a notion of similarity or (in the opposite sense) distance between premises so we can define Cauchy sequences and limits with them. The pseudo-metric we define should assign a distance of 0 between two premises in exactly those cases where we want to consider the premises to be equivalent.

You might think something like this pseudo-metric would do the job:

\(d\left(X,Y\right)=\sup_{A\in\Phi\left(\mathcal{M}\right)}\left|\Pr\left(A\mid X\right)-\Pr\left(A\mid Y\right)\right|.\)

This defines the distance between 𝑋 and 𝑌 to be the “maximum” difference in the probabilities they define. (We actually use the supremum—the least upper bound—because the set of probability differences may not have a maximum. For example, the set { 1-1/𝑛 : 𝑛 ∈ ℕ, 𝑛 > 0 } has no maximum element, but its supremum is 1.)

Unfortunately, the sequence of premises (𝑉ᵢ) approximating the uniform distribution in TCFPD is not a Cauchy sequence under this pseudo-metric. The problem is that that

𝘗𝘳(𝑠 | 𝑌) = 1/2 whenever 𝑠 is a propositional symbol that does not occur in 𝑌, and
𝑉ₘ uses a larger set of manifest symbols than 𝑉ₙ when 𝑚 > 𝑛.

Letting 𝑠ₘ be the propositional symbol “𝚡 < 1/𝑚”, we have

\(\Pr\left(s_{m}\mid V_{m}\right)\to0\quad\mbox{as }m\to\infty\)

and so for any given 𝑛 and 𝑚 > 𝑛,

\(\begin{align*} d\left(V_{m},V_{n}\right) & \geq\left|\Pr\left(s_{m}\mid V_{m}\right)-\Pr\left(s_{m}\mid V_{n}\right)\right|\\ & \to\frac{1}{2}\quad\mbox{as }m\to\infty. \end{align*}\)

A similar problem occurs with the approximations to the Poisson distribution in LSCA.

A pseudo-metric on premises that works

To address this problem we down-weight probability differences for queries involving manifest symbols that appear later in some ordering of the manifest symbols:

Assume some arbitrary ordering on the set of manifest symbols ℳ and define ℳₙ (𝑛 ∈ ℕ) to be the first 𝑛+1 manifest symbols under this ordering. If symbol 𝑠 is number 𝑘 in this ordering (counting from 0), then 𝑠 ∈ ℳₙ for all 𝑛 ≥ 𝑘.
𝛷(ℳₙ) is then the set of queries that use only those propositional symbols numbered 𝑛 or less.
We’ll weight probability differences for a query 𝐴 ∈ 𝛷(ℳₙ) by 𝑤(𝑛), where 𝑤 is some acceptable weighting function: a strictly decreasing function 𝑤 : [0,∞) → (0,1] (for any nonnegative real number it returns a positive value no larger than 1) such that 𝑤(𝑥) → 0 as 𝑥 → ∞. Examples of such functions include 𝑤(𝑥) = exp(-𝑥) and 𝑤(𝑥) = 1/(𝑥+1).

This then is the pseudo-metric we’ll use:

Definition. For any two premises 𝑋,𝑌 ∈ 𝛷⁺(𝛴),

\(\mathfrak{p}_{0}\left(X,Y\right)\triangleq\sup_{n\in\mathbb{N}}w(n)\max_{A\in\Phi\left(\mathcal{M}_{n}\right)}\left|\Pr\left(A\mid X\right)-\Pr\left(A\mid Y\right)\right|.\)

That is: for every 𝑛 ≥ 0, find the maximum difference in probabilities 𝛿ₙ that occurs for a query 𝐴 constructed from at most the first 𝑛+1 manifest symbols, then take the “maximum” of the weighted differences 𝑤(𝑛)𝛿ₙ over all 𝑛 > 0.

The maximum over 𝛷(ℳₙ) used in the above definition exists because 𝘗𝘳(𝐴 | 𝑋) remains unchanged if we replace 𝐴 with any logically equivalent propositional formula, and the set of queries 𝛷(ℳₙ) has only a finite number of equivalence classes. (Each equivalence class corresponds to a truth table on 𝑛+1 symbols. There are 2^𝑁 distinct such truth tables, where 𝑁 = 2^{𝑛+1} is the number of rows in these truth tables.)

Remark. The function 𝔭₀ defined above is actually an entire family of functions, one for each possible combined choice of an ordering of the manifest symbols and an acceptable weighting function. This will not be a problem because none of our results will depend on which ordering or acceptable weighting function is chosen.

Theorem. 𝔭₀ is in fact a pseudo-metric.

Proof. See extended version of this article.

As desired, 𝔭₀(𝑋,𝑌) = 0 iff 𝑋 and 𝑌 are equivalent in the sense described earlier:

Theorem. 𝔭₀(𝑋,𝑌) = 0 iff 𝘗𝘳(𝐴 | 𝑋) = 𝘗𝘳(𝐴 | 𝑌) for all 𝐴 ∈ 𝛷(ℳ).

Proof. 𝔭₀(𝑋,𝑌) = 0 iff for all 𝑛 and all 𝐴 ∈ 𝛷(ℳₙ), 𝘗𝘳(𝐴 | 𝑋) = 𝘗𝘳(𝐴 | 𝑌). But this is just all 𝐴 ∈ 𝛷(ℳ).

The fundamental property of our pseudo-metric

The reason we chose the pseudo-metric 𝔭₀ is that it has the following property:

Theorem 1. A sequence of premises (𝑋ᵢ) is a Cauchy sequence for 𝔭₀ iff for all 𝐴 ∈ 𝛷(ℳ), the sequence (𝘗𝘳(𝐴 | 𝑋ᵢ)) is a Cauchy sequence (for the Euclidean distance on ℝ).

Proof. See extended version of this article.

Recalling that 𝔭₀ is not a single pseudo-metric, but an entire family of pseudo-metrics defined by the choice of ordering of symbols and acceptable weighting function, an immediate consequence of this theorem is that all of the pseudo-metrics in this family are Cauchy-equivalent (they have the same Cauchy sequences). Likewise, any pseudo-metric on 𝛷⁺(𝛴) that is Cauchy-equivalent to 𝔭₀ will do equally well for our purposes, as it will also satisfy this theorem.

The theorem has this important corollary:

Corollary. If (𝑋ᵢ) is a Cauchy sequence for 𝔭₀ then, for any query 𝐴,

\(\lim_{n\to\infty}\Pr\left(A\mid X_{n}\right)\)

exists and is unique.

Proof. By Theorem 1, the sequence (𝘗𝘳(𝐴 | 𝑋ᵢ)) is Cauchy for the Euclidean distance on ℝ, hence it converges to a limit, and since the Euclidean distance is a metric this limit is unique.

Taking the completion: generalized premises

Unfortunately, the pseudo-metric 𝔭₀ does not give us a complete pseudo-metric space of premises: some Cauchy sequences do not converge. For example, define

\(X_{n}\triangleq\neg m_{1}\wedge\cdots\wedge\neg m_{n}\)

where the 𝑚ᵢ are distinct manifest symbols. Then (𝑋ᵢ) is a Cauchy sequence that doesn’t converge: we have

\(\lim_{n\to\infty}\Pr\left(m_{j}\mid X_{n}\right)=0\quad\mbox{for all }j\)

but there is no (finite) formula 𝑋 yielding 𝘗𝘳(𝑚ⱼ | 𝑋) = 0 for all 𝑗.

Just as the real numbers ℝ were defined as the completion of the rational numbers ℚ to remedy such a deficiency, we define the space of generalized premises as the completion of our pseudo-metric space of premises.

Definition. (𝒫, 𝔭) is the canonical completion of the pseudo-metric space space (𝛷⁺(𝛴), 𝔭₀). That is, 𝒫 is the set of Cauchy sequences of premises (for the pseudo-metric 𝔭₀), and 𝔭 is the derived pseudo-metric on 𝒫 defined by

\(\mathfrak{p}\left(\left(X_{i}\right),\left(Y_{i}\right)\right)=\lim_{n\to\infty}\mathfrak{p}_{0}\left(X_{n},Y_{n}\right).\)

A generalized premise is any member of the set 𝒫.

As usual for incomplete pseudo-metric spaces and their completions, we will generally blur the distinction between a premise 𝑋 and the corresponding generalized premise 𝜄(𝑋) ≜ (𝑋, 𝑋, 𝑋, …).

We extend the probability function to operate on generalized premises as follows:

Definition. Let 𝒳 = (𝑋ᵢ) be a generalized premise. Then

\(\Pr\left(A\mid\mathcal{X}\right)\triangleq\lim_{n\to\infty}\Pr\left(A\mid X_{n}\right).\)

Note that if generalized premises 𝒳 = (𝑋ᵢ) and 𝒴 = (𝑌ᵢ) are equivalent, then 𝘗𝘳(𝐴 | 𝒳) = 𝘗𝘳(𝐴 | 𝒴) for any query 𝐴:

\(\begin{align*} \mathfrak{p}\left(\mathcal{X},\mathcal{Y}\right)=0 & \Rightarrow\lim_{n\to\infty}\mathfrak{p}_{0}\left(X_{n},Y_{n}\right)=0\\ & \Rightarrow\lim_{n\to\infty}\left|\Pr\left(A\mid X_{n}\right)-\Pr\left(A\mid Y_{n}\right)\right|=0\\ & \Rightarrow\left(\lim_{n\to\infty}\Pr\left(A\mid X_{n}\right)\right)=\left(\lim_{n\to\infty}\Pr\left(A\mid Y_{n}\right)\right). \end{align*}\)

The definition of the probability function for generalized premises is consistent with its definition for simple premises 𝑋 ∈ 𝛷⁺(𝛴): if 𝒳 = 𝜄(𝑋) = (𝑋,𝑋,𝑋,...) then for any query 𝐴,

\(\Pr\left(A\mid\mathcal{X}\right)=\lim_{n\to\infty}\Pr\left(A\mid X\right)=\Pr\left(A\mid X\right).\)

In his work on probability theory E. T. Jaynes stresses that probabilities are determined by one’s “state of information,” and writes 𝘗𝘳(𝐴 | 𝑋) for the probability assigned to 𝐴 under the state of information 𝑋, but never clearly defines just what a “state of information” is. I propose the notion of a generalized premise as the formalization of Jaynes’ concept of “state of information.” It is the limit of an increasingly detailed database of facts.

Computability

There is one possible objection to this use of generalized premises that we need to address. Our approach to probability theory is based on the EPL Theorem, which generalizes classical propositional logic to handle degrees of plausibility. Logics are supposed to be computable:

Both propositional formulas and predicate-logic formulas are finite objects, and the operations to construct them are all simple, computable, textual operations.
Checking whether a text string is a valid propositional formula or predicate formula is decidable (it is computable whether the answer is yes or no) using standard parsing techniques.
The entailment relation of classical propositional logic is decidable. One (inefficient) algorithm to decide whether 𝑋 ⊧ 𝐴 is to enumerate all truth tables on the set of propositional symbols occurring in 𝑋 or 𝐴, and check whether 𝑋 evaluates to true and 𝐴 evaluates to false for any of these.
The set of axioms for a theory in mathematical logic is generally decidable. (In some contexts this is weakened to a requirement that the set be recursively enumerable, meaning that there is an algorithm that will output all elements of the set one by one.)
Checking whether a proposed proof is valid is decidable whenever the set of axioms itself is decidable, and semi-decidable when that set is merely recursively enumerable. (Semi-decidable means there is an algorithm that never returns an incorrect answer, although it may fail to return at all if the answer is no.)

As long as we were dealing only with (finite) propositional formulas, computability was obvious. Constructing premises and queries is computable (same as for classical propositional logic), and applying the EPL Theorem to compute 𝘗𝘳(𝐴 | 𝑋) just requires enumerating all truth tables on the propositional symbols of 𝐴 or 𝑋 and evaluating both 𝑋 and 𝐴 ∧ 𝑋 for each of these.

But with generalized premises we are now dealing with infinite sequences of premises. These infinite sequences should be computable, meaning there should exist an algorithm to generate them one element at a time in order, and 𝘗𝘳(𝐴 | 𝑋) should also be computable when 𝑋 is a generalized premise, which requires computing a limit. Properly addressing these issues requires an understanding of what computability even means when dealing with with uncountable domains whose members are nonetheless finitely approximable, such as ℝ or 𝒫. This is the topic of type-2 computability, a.k.a. the Type Two Theory of Effectivity, and I’ll have to write a few tutorial articles on that subject before we can properly discuss the computability issues with generalized premises. So, although I’ve brought up the issue here, I’m going to defer dealing with it until a later article.

Coming attractions

In the remaining articles in this series we’ll discuss (at least) the following issues:

Continuity. Why it’s essential, and some techniques for proving operations on generalized premises to be continuous. We’ll show that 𝘗𝘳(𝐴 | ⋅ ) is a continuous function from 𝒫 to ℝ, and that (𝐴 ∧ ⋅ ), the operation of conditioning a premise on a query 𝐴, is a continuous partial function from 𝒫 to 𝒫.
Proving that a sequence of premises is Cauchy. This is required when defining a generalized premise. We’ll prove some theorems to aid in this task and apply them to showing that the sequences of premises defined in TCFPD and LSCA are in fact Cauchy, and hence are generalized premises.

Recall that 𝛴 is the countably infinite set of propositional symbols available for constructing propositional formulas.

E. T. Jaynes (2003), Probability Theory: The Logic of Science, chapter 2 (p. 38).

Epistemic Probability

Discussion about this post