Letβs look at another example of deriving a standard probability distribution by
starting with a finite premise expressing a finite approximation of our available information, then
taking the limit as the amount of detail increases to infinity.
Events of some sortβsay, requests arriving at a web server, or emails arriving in some mailboxβoccur at an average rate of π per unit time. Let π be the number of events that occur during some given time interval of duration π‘, which weβll call the query interval; what probabilities should we assign to each of the possible values of π?
Defining the premise and query
We analyze this problem as follows:
We interpret βaverage rate of πβ to mean that ππ events occur over some large time interval of duration π, which weβll call the reference interval to distinguish it from the smaller query interval contained within.
We divide the reference interval into π/π small time slices of duration π, which is chosen to be small enough that we can assume at most one event occurs in any given time slice.
Let πΊα΅’, 0 β€ π < π/π, be a propositional symbol intended to mean βan event occurs during time slice πβ (which begins at time ππ and ends at time (π+1)π).
Let πβ, π β₯ 0, be a propositional symbol intended to mean that π of the πΊα΅’, for π in the query interval, are true.
Let π be the start of the query interval.
We assume that π,π ,π‘ β₯ 0, π + π‘ β€ π, and π,π > 0.
After analyzing the case for finite π and nonzero π, we let π β β and π β 0.
A technical complication is that ππ, π/π, π /π, and (π +π‘)/π are not necessarily integers, although they are used as such. To address this we define:
We also define these:
π is a propositional formula stating that exactly π of the πΊα΅’, 0 β€ π < π, are true.
For each π β₯ 0, π·β is a propositional formula stating that exactly π of the πΊα΅’ in the query interval are true.
Our premise π is the propositional formula
\(Y\wedge\bigwedge_{k=0}^{m}\left(d_{k}\leftrightarrow D_{k}\right),\)which adds to π definitions of the symbols πβ.
Our potential queries are the πβ.
Probabilities in the finite case
There are
ways of satisfying π: there are π time slices, π of which must contain an event. Furthermore, there are
ways of satisfying π β§ πβ: there are π time slices in the query interval, π of which must contain an event, and there are π-π remaining time slices in the reference interval, π-π of which must contain an event. Therefore, by the EPL Theorem,
This, it turns out, is just the hypergeometric distribution probability
for π successes when doing π draws without replacement from a population of size π containing π success cases. To see why, think of the π events as random draws, the π time slices in the reference interval as the population from which we draw, and the π time slices in the query interval as the success cases.
Technical note: for the above to make sense, we need π β€ π, π β€ π, and 0 β€ π-π β€ π-π, all of which hold true for π > π‘ sufficiently large and π sufficiently small.
Take it to the limit
Now let π,1/π β β. We have π,π,π β β and furthermore that
where π βΌ π means π/π β 1 (π and π are asymptotically equal). As proven here
the hypergeometric distribution converges to the Poisson distribution under the above conditions:
Alternative: average over possible worlds instead of time
The preceding used the notion of the long-run rate at which events occur. This may be inappropriateβevents may be limited to a time interval too short for taking π β β to make sense. Letβs look at another approach based on averaging over possible states of the world:
There are π possible states of the world.
We interpret βaverage rate of πβ to mean that the number of events occurring in the reference interval, averaged over the π possible world states, is ππ.
Let π¦(π,π), 0 β€ π < π/π and 1 β€ π β€ π, be a propositional symbol intended to mean βin possible world π an event occurs during time slice πβ.
Let π€(π), 1 β€ π β€ π, be a propositional symbol intended to mean βpossible world π is the actual world.β
Let πβ, π β₯ 0, be a propositional symbol intended to mean that there are π indices π in the query interval for which π¦(π,π) is true, where π is the actual world.
We assume that π,π ,π‘ β₯ 0, π +π‘ β€ π, and π,π,π > 0.
After analyzing the case for finite π and nonzero π, we let π β β and π β 0.
We define define π and π similarly as before, but larger by a factor of (about) π; π we define identically as before:
The revised definitions of π and π·β are these:
π is a propositional formula stating that exactly π of the π¦(π,π), 0 β€ π < round(π/π) and 1 β€ π β€ π, are true.
For each π β₯ 0, π·β is a propositional formula stating that π€(π) β§ π¦(π,π) is true for exactly π of the indices π and π, round(π /π) β€ π < round((π +π‘)/π) and 1 β€ π β€ π.
We also define
π β β¨π€(1), β¦, π€(π)β©, i.e. βπ€(π) is true for exactly one index π.β
Our revised premise π is then the propositional formula
Probabilities for the alternative analysis
The revised definitions for π, π, and π were chosen so that
as before, by identical reasoning. Furthermore, as S,1/π β β we have
yielding
and so the conditions for
still hold.
Commentary
This derivation contrasts with our previous derivation of the uniform distribution over the unit interval in two ways:
It makes use of latent symbols: the propositional symbols π¦α΅’ (or π¦(π,π) and π€(π)), which appear only in the premise, but not in any query of interest. (The term βlatent symbolβ corresponds to the term βlatent variableβ in statistics.) They may be thought of as unobserved variables that are nonetheless important in the description of the situation under consideration. With the approach of averaging over possible worlds it is obvious that only the actual world can be observed. With the approach of averaging over time, it might be, for example, that only the total number of web server requests per each defined time period are logged, but not the individual times at which they arrived.
As π (or π) and 1/π increase we do not simply add additional conjuncts to the premise, as occurred in the previous derivation. Instead, we have a more general closer and closer approximations to the desired ideal.
We also found that whether we characterized our knowledge as one of a long-run average over time, or as an average over limited time and a large number of possible worlds, we got the same distribution for the number of events π.