Montmort's Gamble: Thinking about Increasingly Frequent yet Less Likely Events

The Gamble

Pierre Raymond de Montmort in his treatise of gambling chances “Essay d'analyse sur les jeux de hazard” described the following game of chance (paraphrasing liberties taken):

A deck of N cards is labeled 1....N and shuffled. The player starts counting from 1 as the dealer flips this shuffled deck of cards over one by one in turn. If the player sees a card with the same value as she counts (i.e on the 2nd flip the card is card number 2) the player wins. Otherwise the player loses.

I have built a simulation of the game for 10 cards on my coding page. What is the chance of winning this game? The key to answering this is the inclusion-exclusion principle.

A Solution

As with all mathematical probability questions, a difficult starting point is naming the sample space and included relevent events. The sample space here must be thought of broadly as the set of all possible arrangments of the deck, of which there are N!=N(N−1)(N−2)…(2)(1). Considering just each flip runs into issues with independence when trying to extrapolate to the entire run of cards. Consider the event Ej= card j matches, so the chance to win (getting at least one match) would be P(E1∪E2∪⋯∪EN)

A way to relate a union of a bunch of sets to their intersections is to add all sets, subtract off the 2-pair intersections, add in the 3-pair intersections as it was subtracted off mistakenly, subtract the 4-pair intersections, etc. We then need to examine the intersection events themselves starting with just one event:

P(Ej)

Imagine not showing any cards and wanting to pick, out of the N cards, your current number. Assuming each is equally likely (intrinsic to Prof. Blitzstein’s so-called naive definition of probability) this reduces to the chance of picking one exact card from N leading to:

P(E_j) = \frac

Consider now the 2-pair: E1∩E2. This amounts to counting how many ways to arrange the rest of the deck if the first two match, therefore:

P(E_1 \cap E_2) = \frac

Extend this to the k-pair: E1∩E2∩⋯∩Ek. Notice if k=N then it is 1/N! as we are asking for a complete match, only 1 possible shuffling out of the total.

P(E_1 \cap \dots \cap E_k) = \frac

Start with adding all single events (one match) then subtracting the double events (2 matches) due to overcounting, etc to get:

P(E1∪E2∪⋯∪EN)= = \binom \frac - \binom \frac +   \binom \frac - \dots + (-1)^ \binom \frac

= 1 - \frac + \frac - \frac + \dots + \frac  \approx 1-e^

Notice that the finite sum comes out exactly to an N-term Taylor series approximation to 1-e^ ! The convergence is uniform as ez is a quintessential entire function, with incredibly fast convergence speed as the error term of approximation is O(1/N!) (seen below).

Orange = 1-exp(-1) with the blue being truncated Taylor series approximations

Orange = 1-exp(-1) with the blue being truncated Taylor series approximations

A Solution to Particular Cases

What about the chance that you have exactly m matches in a game? It is hard to think of the amount of arrangments which have strict requirements like exactly m matches (at least for difficult values of m), though appealing to another probability trick (and general life strategy) we can avoid this responsibility by doing the complete opposite. The number of arrangments with no matches (a derrangement denoted D) given N cards would be (total number of arrangments possible) - (all 1-match arrangments + all 2-match + … + all N match arrangments):

D= N! - \sum_^N (-1)^ \binom (N-j)!

= N! - N!\sum_^N \frac

= N! \left( 1 + \sum_^N \frac \right)

D= N! \sum_^N \frac

To grab an arrangment that has say m matches just pick the numbers that match. Of N cards we are choosing m without replacement and order does not matter so \binom arrangments fulfill this. However we now need to pick the rest of the cards to NOT match! How many derrangments can be made from N−m cards? The above says N! \sum_ \frac, putting this together:

\text{Num of Arrangments with }m \text  = \binomN! \sum_ \frac

The probability of m matches now is just to divide this value by the total number of arrangments N! to get:

P(m \text ) =  \binom \sum_ \frac

For N = 10, the probability of getting 9 matches is zero but 10 matches is not! Why do you think that is?

For N = 10, the probability of getting 9 matches is zero but 10 matches is not! Why do you think that is?

Philosophical Implications

If one considers the deck of cards getting huge N→∞ then a typical conjecture is that the chance to win is 100% since of course sooner or later the number would match due to some infinite monkey thought experiment justification. Another usual suspicion is that the chance to win would vanish to 0% since the deck is just so massive, how could one possibly hope on getting one exact matching number on any one flip? There is a fundamental struggle between each match becoming more unlikely (more cards so less chance) and the fact that you get more cards to try to match (more cards so more chance). In the face of these human intuitions mathematics then comes in and says the answer is 1-e^, a seemingly preposterous value synonomous with continuous processes, in a discrete counting situation no less. I was able to prove with Taylor theory that not only were those intuitions incorrect, those intuitions diverge at uniform speeds which is a level of wrongness nearly nonexistent in the real world.

When tragedy befalls people, thoughts that usually come up are “Why me?” or “Why not someone else“. When someone wins the lottery there seems to be a lot less questions and more statements about luck or rehashing of the “You can’t win if you don’t play” sentiment. Rigorous decision making or emotion setting with only qualitative estimates of probabilities seems impossible as humans share some supernatural or superstitious relationship with some undefined concept called “chance”.

In war, which is an intense form of life, Chance casts aside all veils and disguises and presents herself nakedly from moment to moment as the direct arbiter over all persons and events
— Churchill, Thoughts and Adventures

It it certainly easy to poke at the psychology of probability and how flawed it is but assuming a perfect understanding of the mathematical argument, does that potentially construct some bridge between a more grounded intuition and circumspect in other problems? The answer evidently is no, as a perfect understanding of this argument and its techniques makes similar questions of chance easier to tackle but not easier to intuit. There is no objective connection to be made between the numerical answer and the problem statement at large, not even in retrospect. The counting aspect of this problem I think is the main culprit, even as problems remain finite it is possible to convolute problems to make the counting extremely difficult (though not theoretically impossible given that cases are clearly defined). Naturally if we assume there is an omni-counter (someone who can perfectly count all finite cases of any finite chance problem), would this omni-counter have transferrable intuition about the apparent limiting clash between number of cards creates more chances to be right (more cards = more chance) and that the chance to be right about any one card diminishes (more cards = less chance)?

Defining "badness" in heuristic teaching

In every level of mathematics, particularly in elementary and high schools, teachers use heuristics to get concepts across to students. Even before the dot com boom, we knew that all learning is distance learning, with only proximity to the students changing while the fact remains that concepts need to travel from a teachers’ brain to a students’. It is entirely the case that concepts are whittled of subtleties or interesting extremal examples in favor of raising test scores through substituting critical thinking with pure “mechanics“. In this new tagged series Bad Heuristics inspired by this MathOverflow question I philosophize on some of these teaching practices.

What is meant by a “bad” heuristic?

A tentative first definition of bad heuristic is drawn up in regards to two main offenders; under-emphasis of mathematical detail and over-emphasis of mechanical writing:

A method of explanation or methodology itself which forgoes details of a mathematical concept knowingly or otherwise in favor of mechanical solutions.
— Definition v1.0: Bad Heuristic

First what is meant by “mechanical solutions” is an instruction method with often a literal instruction of what to do with ones pencil when working on mathematical problems built in. As if the teacher is, perhaps out of pressure to teach to a certain class of questions, trying to force the hand of the test taker manually. Here are some heuristics guilty of this and each of which I hope to unpack in more detail in future posts:

  • First-Outside-Inside-Last (FOIL) methodology for binomial term multiplication.

  • A continuous function is one in which when you draw it, you “don’t pick up your pencil”.

  • Function evaluation is the same thing as “plug n’ chug” in a given formula.

  • The quadratic formula is memorized through song and not properly motivated or explained, simply the mechanics of writing it on paper are committed to memory.

The motif of these is that mathematics is reduced to algebraic manipulation. If a course exam is legislated to a small enough set of question types, a bad heuristic is one invariably drawn up to cut conceptual corners and have students instead memorize pencil movements or disconnected facts, not unlike teaching how to perform a piano song by skipping music theory in favor of explaining what buttons to press and when.

Why this definition does not include “incomplete” heuristics

Intuitively it seems a heuristic focusing purely on memorization of hand movements or sounds is “worse” than heuristics that do point to conceptual understanding though with some lacking details or failings in some extremal examples. I then elect to call such cases “incomplete” heuristics and so a “bad” heuristic can be thought of as an extension of these:

  • Multiplication is just repeated addition (also exponentiation is just repeated multiplication).

  • We “cannot” take the square root of a negative number.

  • Integration of a polynomial is a “reverse power rule” where you add one to the power and divide by the new power.

Each of these are just trying to facilitate an understanding of a concept though on a level below perhaps what will be required later in a students career or overlook a particular case (as in the case of natural logarithm in the last case). In any case, expounding on the concept or delving into cases can get students up to speed whereas in a “bad” heuristic conceptual understanding is purposefully replaced by structural routine.

The Philosophy of Probability I

The interpretation, objectiveness, and subjectiveness of "probability" or "chance" is a vast and especially new field to this author. Using the Whitepages of philosophers, PhilPeople, over 888 modern researchers confess their interest in the "Philosophy of Probability" with some focus on  the three characteristics above.  My goal from here, to benefit my own elucidation, is to briefly summarize and expound upon traditional schools of thought for the philosophy of probability seen in The Stanford Encyclopedia of Philosophy and other sources to be listed. Ultimately, this post is successful if the reader learns along with me.

My own purely mathematical training has a probability calculating structure involving "measures" happening on certain event spaces thought of as "sets" created in Kolmogorov's pivotal work [2]. However this abstract structure is left dusty if not briefly explained even for baccalaureate holders in mathematics and certainly is unapproachable for those otherwise. Notwithstanding, the news reports a certain chance of rain (i.e 70% chance of rain today) and people nod in agreement at the idea. What is this idea? Can some rigorous argumentation perhaps clarify such a statement for ordinary people? As far as science goes, this system provides adequate explanation and usefulness however philosophy will attempt to answer what exactly probabilities are.

The Philosophical Problem Statement

  • Can an all-knowing being predict all stochastic outcomes?

  • Are “probabilities” agent-independent? That is, do they exist in reality or are they only human based facts of ignorance. [6]

  • If probabilities exist in reality, what exactly do they represent?

Kolmogorov's Probability Calculus

if this calculus be condemned, then the whole of the sciences must also be condemned.
— Henri Poincaré

Andrey Kolmogorov pioneered a body of thinking which allowed for quantification/calculation with probability [1], this is the aforementioned sets and measures construction. We think of events (outcomes of a stochastic scenario) as individual elements contained in a big set S. As events may happen concurrently, we make a big bin containing possible subsets of S. The technical name for this bin is a field (also called an algebra), notated F, is constructed with the following axioms:

  • If A is a set of event(s) in F, then the complement of A is also in F; that is, A not happening is sensible.

  • If A,B are sets of event(s) in F, then their union is also in F; that is, concurrent events happening is sensible.

This structural logic is then F is a field containing all possibilities (subsets of S) such that combinations of the events (union, intersection, and complementation) are themselves possible events in the space. Furthermore, a function P called the "probability measure" exists so that we can assign a numerical chance (from 0 to 1) to the events of F. There are some natural restrictions on P:

  • P(A) must be non-negative; that is, something having a negative probability is nonsense.

  • P(S) = 1; that is, the probability of at least something from the set of all possibilities happening is 1 (100% chance).

  • P(A union B) = P(A) + P(B) for any events A,B in F such that A and B are disjoint; that is, the chance of two unrelated events occurring can be added from their individual chances of occurring.

The hierarchy of this structure is drawn below:

A,B are events contained in S (the sample space of all events) which is sitting in the field F containing all of S as well as the possible combinations of events in S.

A,B are events contained in S (the sample space of all events) which is sitting in the field F containing all of S as well as the possible combinations of events in S.

It is important to note the order of definition. First F is defined , then allowing P to be defined; that is, F need not be the set of all subsets. For instance if we are tossing die, we could be interested in the event in an odd numbered face. The definition of F then could include odd or even results and miss specific results like 2 or 5 but still satisfy the axioms above. Interestingly, as noted in [2], this is not the only viable option for so called coherent axiom structure and so the question of "best" axiomization becomes entirely philosophical.

Unfortunately F also need not be finite; consider picking any decimal number between 0 and 1, here we have infinitely many possibilities. In such a case, a fourth axiom (countable additivity) becomes necessary:

  • P(infinite union of events) = Sum of all individual event probabilities

[6] mentions this is the most controversial axiom, drawing several interesting paradoxical thought experiments. The point philosophically is that this axiom asserts that individual events in an infinite list have probability 0, leading immediately to counterintuitive results.

An Explicit Example

The triple (S,F,P) forms a probability space altogether. Consider the experiment of flipping a fair coin twice, recording the result each time. Then the above structure can be written explicitly:

  • The possible events are contained in S = {H,T} where T = tails and H = heads.

  • The field containing the events and their set combinations is F = {S, {H} ,{S}, {}} where {} is the empty set.

  • Then P(S) = 1 (the chance of flipping heads or tails is 100%), P(H) = 50% = P(T) and P({}) = 0 as it must land on either heads or tails.

It is possible to explicitly write down these constructions in such a simple example and other slightly more complicated examples, but it quickly gets out of hand in more practical scenarios.  

A Less Explicit Example

Consider Benford’s Law. In short, it is a counter-intuitive result that the leading significant digits of some real-world data-sets are not uniformly random. In fact they follow a skew-right distribution favoring low digits like 1,2,3 well over higher digits like 8,9. This result then becomes a viable tool for detecting fraud , as people who are faking numbers will create data that is “too uniformly random”. This effect can also be seen with students taking multiple choice exams, students often prefer changing an answer if the ones before it are the same answer.

The machinery of Kolmogorov can be implemented here as well. Consider the sample space as possible leading digits:

  • S = {1,2,3,4,5,6,7,8,9}

  • F = {{},{1},{2},{3},{4},{5},{6},{7},{8},{9},S} as pairs of events cannot occur.

  • P(1) = ?, P(2) = ?, …

How can we assign the probabilities? Empirical frequency evidence is displayed for the use of Benford’s Law in many sources and even theoretical limiting justification was also derived to do the job.

The interesting psychological question is, how is it that it works so well at catching fraud? If people are faking data and consequently appeal to a false yet seemingly true sense of randomness, what other misconceptions are commonly held?

On the philosophical side, the subjective credence held by fakers of data is hugely disparate with empirical notions of this probability function P. Is there some fundamental axiom violation held by people or are they abiding by misused coherent axioms? That is, can some formal structure be defined so that non-omniscient beings can act “rationally”, Brian Weatherson in 2003 seems to tackle this problem.

Interpreting the word "Probability" 

When the word "probability" was used above, it was in reference to the output of the mathematical function P. This function does not have closed-form in general and greatly changes depending on the stochastic scenario at hand and on what events are in F. In our coin flip example, the output of P was deduced by recognizing that the event H had 1 way to happen and was among 2 possibilities, thus 1/2 = 50% is the chance as it is also understood that each of the 2 events was equally likely. Irrespective of the agent observing this, the chance remains 50% and so this chance exists fundamentally outside of subjective perception. Other uses of the word may not include this quantification and only reference some subjective degree of belief (which does depend on the agent believing it). Some great minds have explored this so here I explore some of that thought. 

Classical Interpretation

the most important questions of life, are indeed for the most part only problems of probability
— Pierre Laplace

To make this post of reasonable length, I will pick up here by expanding upon [5] in a future post.

 

References

  1. The Stanford Encyclopedia of Philosophy

  2. Foundations of the Theory of Probability (1933) - Andrey Kolmogorov

  3. Adam Caulton's blog (philosophy professor)

  4. Philosophy of Probability: Contemporary Readings (2011) - Anthony Eagle

  5. A Philosophical Essay on Probabilities - Pierre Simon Laplace

  6. Aiden Lyon’s Intro to the Philosophy of Probability.