Metric Spaces, Part 1

Going the Distance

Apr 18, 2025

Motivation

Recall Jaynes’ finite sets policy:

In principle, every problem must start with… finite-set probabilities; extension to infinite sets is permitted only when this is the result of a well-defined and well-behaved limiting process from a finite set.

In first-year calculus we learn about limiting processes in the realm of numbers; the real numbers themselves are obtained by starting with the finitely-representable rational numbers and filling in the gaps, so to speak, adding additional numbers such as √2 that are approximated arbitrarily closely by rational numbers but are not themselves rational. These limiting processes rely on a notion of distance between two values, with the distance between real numbers 𝑥 and 𝑦 being |𝑥-𝑦|. Our limiting processes are then sequences of values which become and remain arbitrarily close to each other, known as Cauchy sequences. (I’ll say more on this in the next article.)

(Pseudo-)metric spaces generalize these concepts to other sorts of mathematical objects for which we can define a suitable distance function. In particular, we will later define a pseudo-metric on premises that is related to how close the probabilities they define are: the distance between 𝑋 and 𝑌 will be defined in terms of the distances between 𝘗𝘳(𝐴 | 𝑋) and 𝘗𝘳(𝐴 | 𝑌) for arbitrary queries 𝐴. This will allow us to define exactly what we mean by a limiting process on premises. And, just as the real numbers are obtained by completing the rational numbers, adding in any missing limits of Cauchy sequences, we’ll obtain a space of generalized premises that take us beyond the finite but can be approximated arbitrarily closely by finite premises.

Definition of a (pseudo-)metric space

A pseudo-metric space (𝑆, 𝑑) is a set 𝑆 together with a real-valued binary function 𝑑 : 𝑆×𝑆 → ℝ, called a pseudo-metric, that behaves like a distance function; that is, for all 𝑥,𝑦,𝑧 ∈ 𝑆,

𝑑(𝑥, 𝑦) ≥ 0 (distances are nonnegative);
𝑑(𝑥, 𝑥) = 0 (every point is at zero distance from itself);
𝑑(𝑥, 𝑦) = 𝑑(𝑦, 𝑥) (the distance from 𝑥 to 𝑦 is the same as the distance from 𝑦 to 𝑥); and
𝑑(𝑥, 𝑧) ≤ 𝑑(𝑥, 𝑦) + 𝑑(𝑦, 𝑧) (triangle inequality: going to an intermediate point first cannot shorten the total distance).

If 𝑑 has the additional property that

𝑑(𝑥, 𝑦) = 0 only if 𝑥 = 𝑦 (any two distinct points are separated by some nonzero distance),

then we call 𝑑 a metric and we call (𝑆, 𝑑) a metric space.

A metric is a generalization of the familiar Euclidean distance between real-valued vectors, but it is a distance function appropriate to the set 𝑆. Here are some examples:

Discrete metric on the integers: 𝑑(𝑥, 𝑦) = 1 if 𝑥 ≠ 𝑦, 0 if 𝑥 = 𝑦.
Hamming distance: 𝑆 is the set of strings of a given length 𝑛, and 𝑑(𝑥, 𝑦) is the number of locations at which 𝑥 and 𝑦 differ. E.g., 𝑑(𝚊𝚗𝚝𝚕𝚎𝚛, 𝚋𝚞𝚝𝚝𝚎𝚛) = 3 because the two strings differ at locations 1, 2, and 4.
Edit distance: 𝑆 is the set of strings of any length, and 𝑑(𝑥, 𝑦) is the number of single-character edits (insertion, deletion, or replacement) required to turn 𝑥 into 𝑦. E.g., if 𝑥 = 𝚊𝚗𝚝 and 𝑦 = 𝚌𝚊𝚗𝚍𝚕𝚎, then 𝑑(𝑥, 𝑦) = 4. (𝑥 → 𝑦: insert 𝚌 at the beginning, replace the 𝚝 with a 𝚍, and inserting 𝚕 and 𝚎 at the end; 𝑦 → 𝑥: delete 𝚕 and 𝚎 from the end, replace the 𝚍 by a 𝚝, and delete 𝚌 from the beginning.)
Prefix metric on Cantor space: 𝑆 is the set of infinite bit strings, and 𝑑(𝑥, 𝑦) = 2⁻ⁿ if the first position at which 𝑥 and 𝑦 differ is position 𝑛, counting from 0. E.g., if 𝑥 = 𝟶𝟷𝟷𝟶𝟶𝟷 ⋯ and 𝑦 = 𝟶𝟷𝟷𝟷𝟶𝟶 ⋯ then 𝑑(𝑥, 𝑦) = 2⁻³.

Equivalence under a pseudo-metric

There is an obvious equivalence relation associated with a pseudo-metric space (𝑆, 𝑑):

\(\left(x\sim y\right)\triangleq\left(d\left(x,y\right)=0\right).\)

(You can easily verify that this defines an equivalence relation.) Consider, for example, the edit distance ignoring case differences. Here we have 𝑑(𝚊𝚗𝚝, 𝙰𝚒𝙽𝚃) = 1 instead of 4. 𝑑 is a pseudo-metric, but not a metric, because

\(d\left(\mathtt{ant},\mathtt{ANT}\right)=0\)

even though 𝚊𝚗𝚝 ≠ 𝙰𝙽𝚃. The strings are equivalent: 𝚊𝚗𝚝 ∼ 𝙰𝙽𝚃.

This equivalence relation gives us a standard way of creating a metric space (𝑆, 𝑑) from a pseudo-metric space (𝑆₀, 𝑑₀):

𝑆 ≜ 𝑆₀/∼₀. That is, 𝑆 is the set of equivalence classes under the equivalence relation ∼₀ defined by the pseudo-metric 𝑑₀.
𝑑([𝑥], [𝑦]) ≜ 𝑑₀(𝑥, 𝑦). This is well-defined because if
\(x'\sim_{0}x\mbox{ and }y'\sim_{0}y\)
then 𝑑(𝑥′, 𝑥) = 𝑑(𝑦′, 𝑦) = 0, and the triangle inequality then guarantees that 𝑑(𝑥′, 𝑦′) = 𝑑(𝑥, 𝑦).

Turning it around, we can think of the elements of 𝑆₀ as concrete representations of more abstract entities, with 𝑑₀(𝑥, 𝑦) defined to be identical to some appropriate distance metric between the abstract entities represented by 𝑥 and 𝑦.

When we define our pseudo-metric on premises, the corresponding equivalence relation will be such that 𝑋 and 𝑌 are equivalent iff 𝘗𝘳(𝐴 | 𝑋) = 𝘗𝘳(𝐴 | 𝑌) for every query 𝐴.

Isometries

If (𝑆₁, 𝑑₁) and (𝑆₂, 𝑑₂) are pseudo-metric spaces, an isometry of 𝑆₁ into 𝑆₂ is a function 𝑓 : 𝑆₁ → 𝑆₂ that preserves distances:

\(d_{2}\left(f\left(x\right),f\left(y\right)\right)=d_{1}\left(x,y\right)\)

for all 𝑥,𝑦 ∈ 𝑆₁. We can think of this as saying that 𝑆₁ is in some sense embedded in 𝑆₂, or is a subset of 𝑆₂, or 𝑆₂ is an extension of 𝑆₁. That latter interpretation will be important when we talk about completing a pseudo-metric space, which is central to implementing Jaynes’ finite sets policy. For an isometry 𝑓, if 𝑓(𝑥) = 𝑓(𝑦) then 𝑑₁(𝑥, 𝑦) = 0, i.e. 𝑥 and 𝑦 are equivalent; and if 𝑑₁ is a metric then 𝑥 = 𝑦. Thus an isometry of a metric space is always one-to-one; distinct elements of the domain always map to distinct elements of the range.

If 𝑓 is an onto function—every element of 𝑆₂ can be expressed as 𝑓(𝑥) for some 𝑥 ∈ 𝑆₁—then we say that 𝑓 is an isometry of 𝑆₁ onto 𝑆₂. If, additionally, 𝑑₁ and 𝑑₂ are metrics, then 𝑓 is both one-to-one and onto (a bijection), and we say that the two metric spaces are isometric to each other. In this case we can think of the two metric spaces as being essentially identical except for renaming of elements.

Here are some geometric examples of these concepts:

Let 𝑆₁ be the real line ℝ with 𝑑₁(𝑥, 𝑦) the one-dimensional Euclidean distance |𝑥-𝑦|, and let 𝑆₂ be the plane ℝ² with 𝑑₂(𝑥, 𝑦) the two-dimensional Euclidean distance ∥𝑥-𝑦∥₂. If we define 𝑓 : ℝ → ℝ² by 𝑓(𝑥) ≜ (𝑥, 3), that is 𝑓 maps the real line to a horizontal line in the plane with 𝑦-intercept 3, then 𝑓 is an isometry of ℝ into ℝ².
Likewise, if we define 𝑔 : ℝ → ℝ² by 𝑔(𝑥) ≜ (𝑥/√2, 𝑥/√2), that is, 𝑔 maps the real line to a 45-degree horizontal line in the plane passing through the origin, then 𝑔 also is an isometry of ℝ into ℝ². Note that we needed the factor of 1/√2 to make the distances match.
Let 𝑆₁ be the plane ℝ² and 𝑑₁ the Euclidean distance. Let 𝑆₂ = 𝑆₁ and 𝑑₂ = 𝑑₁. If 𝑓 : ℝ² → ℝ² is a rigid translation such as 𝑓(𝑥, 𝑦) = (𝑥+3, 𝑦-7) then 𝑓 is an isometry of ℝ² onto itself. Likewise, if 𝑓 is a rigid rotation such as 𝑓(𝑥, 𝑦) = (-𝑦, 𝑥) then 𝑓 is again an isometry of ℝ² onto itself.

Continuity and pseudo-metric spaces

Discussing well-defined and well-behaved limiting processes naturally brings us to the topic of continuity, which also generalizes to arbitrary pseudo-metric spaces. For a function between two pseudo-metric spaces the definition of continuity at a point is the same as it is for functions on the real line, except that again we replace |𝑥-𝑦| with 𝑑(𝑥, 𝑦). Let 𝑓 : 𝑆₁ → 𝑆₂, where (𝑆₁, 𝑑₁) and (𝑆₂, 𝑑₂) are pseudo-metric spaces, and let 𝑥 ∈ 𝑆₁; we say that 𝑓 is continuous at 𝑥 if

\(\begin{flalign*} & \mbox{for all }\epsilon>0\\ & \quad\mbox{there exists }\delta>0\\ & \quad\quad\mbox{such that }d_{2}\left(f\left(x\right),f\left(y\right)\right)<\epsilon\\ & \quad\quad\quad\mbox{whenever }d_{1}\left(x,y\right)<\delta. \end{flalign*}\)

That is, we can make 𝑓(𝑦) arbitrarily close to 𝑓(𝑥) by choosing 𝑦 sufficiently close to (yet distinct from) 𝑥. Additionally, we simply say 𝑓 is continuous if it is continuous at all points in its domain.

For examples in the familiar case of real numbers and Euclidean distance, consider the real-valued functions

\(\begin{align*} \theta(x) & =\begin{cases} 0 & \mbox{if }x<0\\ 1 & \mbox{if }x\geq0 \end{cases}\\ f(x) & =\begin{cases} 1/x & \mbox{if }x\neq0\\ 0 & \mbox{if }x=0 \end{cases} \end{align*} \)

whose domain is all of ℝ. Both of these functions are continuous at every point in ℝ except for 𝑥 = 0.

We can make either of these functions be continuous by restricting their domains: if 𝜓 is defined to be the result of restricting 𝜃 to just the nonnegative real numbers, then 𝜓 is a continuous function; and if 𝑔 is defined to be the result of restricting 𝑓 to just the positive real numbers, then 𝑔 also is a continuous function.

Now consider a less familiar example: a function

\(f\colon\mathbb{B}^{\omega}\to\mathbb{B}^{\omega}\)

on the Cantor space (infinite bit strings), using the prefix metric defined earlier. Plugging the definition of the prefix metric into our generalized definition of continuity for metric spaces yields the following:

𝑓 is continuous at 𝑥 iff, for any 𝑛 ∈ ℕ, there exists some 𝑚 ∈ ℕ such that the the first 𝑚 bits of 𝑥 determine the first 𝑛 bits of 𝑓(𝑥).

(More precisely: the first 𝑛 bits of 𝑓(𝑦) are the same as the first 𝑛 bits of 𝑓(𝑥) whenever the first 𝑚 bits of 𝑦 are the same as the first 𝑚 bits of 𝑥.)

Or consider a real-valued function

\(f\colon\mathbb{B}^{\omega}\to\mathbb{R}\)

on the Cantor space. Then

𝑓 is continuous at 𝑥 iff, for any 𝜀 > 0, there exists 𝑚 ∈ ℕ such that the first 𝑚 bits of 𝑥 determine 𝑓(𝑥) to within an accuracy of 𝜀.

Uniform continuity

We say that 𝑓 is uniformly continuous if, in the definition of continuity discussed above, for any given 𝜀 we can use the same 𝛿 for all 𝑥 ∈ 𝑆₁; that is,

\(\begin{flalign*} & \mbox{for all }\epsilon>0\\ & \quad\mbox{there exists }\delta>0\\ & \quad\quad\mbox{such that for all }x\in S_{1}\\ & \quad\quad\quad d_{2}\left(f\left(x\right),f\left(y\right)\right)<\epsilon\\ & \quad\quad\quad\quad\mbox{whenever }d_{1}\left(x,y\right)<\delta. \end{flalign*}\)

Going back to the function 𝑔(𝑥) = 1/𝑥 defined for 𝑥 > 0, although 𝑔 is continuous, it is not uniformly continuous: the required 𝛿 for any fixed 𝜀 gets arbitrarily small as 𝑥 gets closer and closer to 0 and 𝑔(𝑥) gets arbitrarily large. If, however, we restrict the domain further, defining h(𝑥) = 1/𝑥 for 𝑥 ≥ 1, then h is uniformly continuous: for any given 𝜀 > 0, the 𝛿 > 0 that works for 𝑥 = 1 works for any 𝑥 > 1 also.

Equivalence of pseudo-metrics

Sometimes we don’t care too much about the exact value of the pseudo-metric, being interested only in its implications for continuity. Let 𝑑₁ and 𝑑₂ be pseudo-metrics on the same set 𝑆.

We say that 𝑑₁ and 𝑑₂ are topologically equivalent iff the identity function 𝑓(𝑥) = 𝑥 is continuous when considered either as a function from (𝑆, 𝑑₁) to (𝑆, 𝑑₂) or as a function from (𝑆, 𝑑₂) to (𝑆, 𝑑₁). This means that small distances according to 𝑑₁ are also small distances according to 𝑑₂, and vice versa.
- If you know what a topology is, one can also define pseudo-metrics to be topologically equivalent iff they induce the same topology.
We say that 𝑑₁ and 𝑑₂ are uniformly equivalent iff the identity function 𝑓(𝑥) = 𝑥 is uniformly continuous when considered either as a function from (𝑆, 𝑑₁) to (𝑆, 𝑑₂) or as a function from (𝑆, 𝑑₂) to (𝑆, 𝑑₁).

It’s easy to see that if 𝑑₁ and 𝑑₂ are topologically equivalent, then any continuous function from (𝑆, 𝑑₁) to another pseudo-metric space (𝑇, 𝑒) is also a continuous function from (𝑆, 𝑑₂) to (𝑇, 𝑒). Likewise, any continuous function from (𝑇, 𝑒) to (𝑆, 𝑑₁) is also a continuous function from (𝑇, 𝑒) to (𝑆, 𝑑₂). A similar comment applies for uniform equivalence.

Coming Up

In the next article we’ll talk about limiting processes in the general context of pseudo-metric spaces. This includes Cauchy sequences, limits, the notion of completeness, and how to complete a pseudo-metric space, analogously to how we obtain the real numbers by completing the rationals.

Epistemic Probability

Discussion about this post