An extended version of this article containing all proofs:
Introduction
Recall that the EPL Theorem tells us that
where
π΄ (the query) and π (the premise) are propositional formulas;
π is satisfiable;
π β π΄1 is any finite set of propositional symbols that includes all those occurring in π΄ or π; and
#_π(π΅), for a propositional formula π΅, is the number of truth assignments on π for which π΅ evaluates true.
Propositional formulas are finite expressions, so the premise π can only mention a finite number of propositional symbols, which limits us to finite domains, or at least to a finite number of distinctions on the problem domain. In the articles Turning Concrete Facts into a Probability Distribution (TCFPD) and Latent Symbols and Converging Approximations (LSCA) we saw examples of how an infinite domain, with infinite numbers of distinctions, can be approximated arbitrarily closely by a sequence of finite premises (πα΅’). As π gets larger, πβ has information about an ever-greater number of propositional symbols, and for any query π΄ of interest, ππ³(π΄ | πβ) converges to a limiting value.
These examples followed E. T. Jaynesβ finite sets policy:
Apply the ordinary processes of arithmetic and analysis only to expressions with a finite number of terms. Then after the calculation is done, observe how the resulting finite expressions behave as the number of terms increases indefinitely.
In laying down this rule of conduct, we are only following the policy that mathematicians from Archimedes to Gauss have considered clearly necessary for nonsense avoidance in all of mathematics.2
In this article we formalize the process Jaynes describes.
Queries, premises, and latent symbols
We saw in LSCA that latent symbolsβpropositional symbols that appear only in the premise, but never in the queries, the equivalent of latent variables in statistical modelingβcan be useful in expressing our information about a domain of interest. So we begin by partitioning the countably infinite set π΄ of propositional symbols into two parts, each of them also countably infinite:
β, the set of latent symbols that are used only for defining premises, and
β³, the set of manifest symbols.
Queries are constructed using only manifest symbols; premises may be constructed using both latent or manifest symbols. Since they partition π΄, the manifest symbols β³ and latent symbols β are of course disjoint (no overlap), and their union is π΄.
With this in mind, we can now give a formal definition of βpremiseβ and βqueryβ:
Definition. A query is a propositional formula constructed using only manifest symbols. A premise is any satisfiable propositional formula. Write π·(π) for the set of all propositional formulas constructed from the symbols in π β π΄, and π·βΊ(π) for the set of all satisfiable members of π·(π); then π·(β³) is the set of all queries, and π·βΊ(π΄) is the set of all premises.
Equivalent premises
We wish to consider two premises to be equivalent if they yield the same probabilities, i.e., if ππ³(π΄ | π) = ππ³(π΄ | π) for all queries π΄. Clearly, if π β‘ π (π and π are logically equivalent) then π and π are equivalent premises, but logical equivalence is not necessary for premise equivalence. For example, if
where π β β³ and πβ, πβ, πβ β β, then π and π are equivalent premises:
and since π is the only manifest symbol appearing in π or πthis implies that ππ³(π΄ | π) = ππ³(π΄ | π) for any query π΄.
A pseudo-metric on premises: first try
We also want a notion of similarity or (in the opposite sense) distance between premises so we can define Cauchy sequences and limits with them. The pseudo-metric we define should assign a distance of 0 between two premises in exactly those cases where we want to consider the premises to be equivalent.
You might think something like this pseudo-metric would do the job:
This defines the distance between π and π to be the βmaximumβ difference in the probabilities they define. (We actually use the supremumβthe least upper boundβbecause the set of probability differences may not have a maximum. For example, the set { 1-1/π : π β β, π > 0 } has no maximum element, but its supremum is 1.)
Unfortunately, the sequence of premises (πα΅’) approximating the uniform distribution in TCFPD is not a Cauchy sequence under this pseudo-metric. The problem is that that
ππ³(π | π) = 1/2 whenever π is a propositional symbol that does not occur in π, and
πβ uses a larger set of manifest symbols than πβ when π > π.
Letting π β be the propositional symbol βπ‘ < 1/πβ, we have
and so for any given π and π > π,
A similar problem occurs with the approximations to the Poisson distribution in LSCA.
A pseudo-metric on premises that works
To address this problem we down-weight probability differences for queries involving manifest symbols that appear later in some ordering of the manifest symbols:
Assume some arbitrary ordering on the set of manifest symbols β³ and define β³β (π β β) to be the first π+1 manifest symbols under this ordering. If symbol π is number π in this ordering (counting from 0), then π β β³β for all π β₯ π.
π·(β³β) is then the set of queries that use only those propositional symbols numbered π or less.
Weβll weight probability differences for a query π΄ β π·(β³β) by π€(π), where π€ is some acceptable weighting function: a strictly decreasing function π€ : [0,β) β (0,1] (for any nonnegative real number it returns a positive value no larger than 1) such that π€(π₯) β 0 as π₯ β β. Examples of such functions include π€(π₯) = exp(-π₯) and π€(π₯) = 1/(π₯+1).
This then is the pseudo-metric weβll use:
Definition. For any two premises π,π β π·βΊ(π΄),
That is: for every π β₯ 0, find the maximum difference in probabilities πΏβ that occurs for a query π΄ constructed from at most the first π+1 manifest symbols, then take the βmaximumβ of the weighted differences π€(π)πΏβ over all π > 0.
The maximum over π·(β³β) used in the above definition exists because ππ³(π΄ | π) remains unchanged if we replace π΄ with any logically equivalent propositional formula, and the set of queries π·(β³β) has only a finite number of equivalence classes. (Each equivalence class corresponds to a truth table on π+1 symbols. There are 2^π distinct such truth tables, where π = 2^{π+1} is the number of rows in these truth tables.)
Remark. The function πβ defined above is actually an entire family of functions, one for each possible combined choice of an ordering of the manifest symbols and an acceptable weighting function. This will not be a problem because none of our results will depend on which ordering or acceptable weighting function is chosen.
Theorem. πβ is in fact a pseudo-metric.
Proof. See extended version of this article.
As desired, πβ(π,π) = 0 iff π and π are equivalent in the sense described earlier:
Theorem. πβ(π,π) = 0 iff ππ³(π΄ | π) = ππ³(π΄ | π) for all π΄ β π·(β³).
Proof. πβ(π,π) = 0 iff for all π and all π΄ β π·(β³β), ππ³(π΄ | π) = ππ³(π΄ | π). But this is just all π΄ β π·(β³).
The fundamental property of our pseudo-metric
The reason we chose the pseudo-metric πβ is that it has the following property:
Theorem 1. A sequence of premises (πα΅’) is a Cauchy sequence for πβ iff for all π΄ β π·(β³), the sequence (ππ³(π΄ | πα΅’)) is a Cauchy sequence (for the Euclidean distance on β).
Proof. See extended version of this article.
Recalling that πβ is not a single pseudo-metric, but an entire family of pseudo-metrics defined by the choice of ordering of symbols and acceptable weighting function, an immediate consequence of this theorem is that all of the pseudo-metrics in this family are Cauchy-equivalent (they have the same Cauchy sequences). Likewise, any pseudo-metric on π·βΊ(π΄) that is Cauchy-equivalent to πβ will do equally well for our purposes, as it will also satisfy this theorem.
The theorem has this important corollary:
Corollary. If (πα΅’) is a Cauchy sequence for πβ then, for any query π΄,
exists and is unique.
Proof. By Theorem 1, the sequence (ππ³(π΄ | πα΅’)) is Cauchy for the Euclidean distance on β, hence it converges to a limit, and since the Euclidean distance is a metric this limit is unique.
Taking the completion: generalized premises
Unfortunately, the pseudo-metric πβ does not give us a complete pseudo-metric space of premises: some Cauchy sequences do not converge. For example, define
where the πα΅’ are distinct manifest symbols. Then (πα΅’) is a Cauchy sequence that doesnβt converge: we have
but there is no (finite) formula π yielding ππ³(πβ±Ό | π) = 0 for all π.
Just as the real numbers β were defined as the completion of the rational numbers β to remedy such a deficiency, we define the space of generalized premises as the completion of our pseudo-metric space of premises.
Definition. (π«, π) is the canonical completion of the pseudo-metric space space (π·βΊ(π΄), πβ). That is, π« is the set of Cauchy sequences of premises (for the pseudo-metric πβ), and π is the derived pseudo-metric on π« defined by
A generalized premise is any member of the set π«.
As usual for incomplete pseudo-metric spaces and their completions, we will generally blur the distinction between a premise π and the corresponding generalized premise π(π) β (π, π, π, β¦).
We extend the probability function to operate on generalized premises as follows:
Definition. Let π³ = (πα΅’) be a generalized premise. Then
Note that if generalized premises π³ = (πα΅’) and π΄ = (πα΅’) are equivalent, then ππ³(π΄ | π³) = ππ³(π΄ | π΄) for any query π΄:
The definition of the probability function for generalized premises is consistent with its definition for simple premises π β π·βΊ(π΄): if π³ = π(π) = (π,π,π,...) then for any query π΄,
In his work on probability theory E. T. Jaynes stresses that probabilities are determined by oneβs βstate of information,β and writes ππ³(π΄ | π) for the probability assigned to π΄ under the state of information π, but never clearly defines just what a βstate of informationβ is. I propose the notion of a generalized premise as the formalization of Jaynesβ concept of βstate of information.β It is the limit of an increasingly detailed database of facts.
Computability
There is one possible objection to this use of generalized premises that we need to address. Our approach to probability theory is based on the EPL Theorem, which generalizes classical propositional logic to handle degrees of plausibility. Logics are supposed to be computable:
Both propositional formulas and predicate-logic formulas are finite objects, and the operations to construct them are all simple, computable, textual operations.
Checking whether a text string is a valid propositional formula or predicate formula is decidable (it is computable whether the answer is yes or no) using standard parsing techniques.
The entailment relation of classical propositional logic is decidable. One (inefficient) algorithm to decide whether π β§ π΄ is to enumerate all truth tables on the set of propositional symbols occurring in π or π΄, and check whether π evaluates to true and π΄ evaluates to false for any of these.
The set of axioms for a theory in mathematical logic is generally decidable. (In some contexts this is weakened to a requirement that the set be recursively enumerable, meaning that there is an algorithm that will output all elements of the set one by one.)
Checking whether a proposed proof is valid is decidable whenever the set of axioms itself is decidable, and semi-decidable when that set is merely recursively enumerable. (Semi-decidable means there is an algorithm that never returns an incorrect answer, although it may fail to return at all if the answer is no.)
As long as we were dealing only with (finite) propositional formulas, computability was obvious. Constructing premises and queries is computable (same as for classical propositional logic), and applying the EPL Theorem to compute ππ³(π΄ | π) just requires enumerating all truth tables on the propositional symbols of π΄ or π and evaluating both π and π΄ β§ π for each of these.
But with generalized premises we are now dealing with infinite sequences of premises. These infinite sequences should be computable, meaning there should exist an algorithm to generate them one element at a time in order, and ππ³(π΄ | π) should also be computable when π is a generalized premise, which requires computing a limit. Properly addressing these issues requires an understanding of what computability even means when dealing with with uncountable domains whose members are nonetheless finitely approximable, such as β or π«. This is the topic of type-2 computability, a.k.a. the Type Two Theory of Effectivity, and Iβll have to write a few tutorial articles on that subject before we can properly discuss the computability issues with generalized premises. So, although Iβve brought up the issue here, Iβm going to defer dealing with it until a later article.
Coming attractions
In the remaining articles in this series weβll discuss (at least) the following issues:
Continuity. Why itβs essential, and some techniques for proving operations on generalized premises to be continuous. Weβll show that ππ³(π΄ | β ) is a continuous function from π« to β, and that (π΄ β§ β ), the operation of conditioning a premise on a query π΄, is a continuous partial function from π« to π«.
Proving that a sequence of premises is Cauchy. This is required when defining a generalized premise. Weβll prove some theorems to aid in this task and apply them to showing that the sequences of premises defined in TCFPD and LSCA are in fact Cauchy, and hence are generalized premises.
Recall that π΄ is the countably infinite set of propositional symbols available for constructing propositional formulas.
E. T. Jaynes (2003), Probability Theory: The Logic of Science, chapter 2 (p. 38).