MMM vs MUM: First Step Analysis

1. Introduction

In the previous chapter we developed several algorithms to use multiple throws of a die to give a biscuit to one child out of 19 such that each child has the same probability to win. One algorithm to accomplish this worked like this. Continue throwing the die until the last two outcomes lie for the first time somewhere in the set \(\{(1,1), (1,2),\ldots, (1,6), (2, 1), \ldots, (2,6), (3,1),\ldots, (3,6), (4,1)\}\). (Note that this set has 19 elements.) Child \(1\) wins when the last two outcomes are \( (1,1)\), child \(2\) wins in case when \( (1, 2) \) , and so on. This idea, while promising, proved wrong: some children are more likely to win than others.

In this section we will develop the tools to understand why this algorithm is not fair, as it based on the famous monkey typing problem, which reads like this. A monkey types with equal probability, and independent of previous key strokes, any key on a keyboard with 26 letters. The monkey problem is find out which of the two words \(mmm\) and \(mum\) has the largest probability to be typed first. What is the relation between the monkey and the cookie problem? Well, the monkey keeps on typing until the last three key strokes lie in the set \(\{mum, mmm\}\), and it turns out that the word \(mum\) has a larger probability to get hit first. In a similar way, the sequences related to the children do not have equal expected hitting times, hence unequal winning probabilities.

To understand this interesting problem, we will develop a few useful concepts: first-step analysis, stopping times and hitting times.

In the typing monkey case, the probability to hit a certain key is \(p=1/26\). Below we will see that using such small probabilities in simulation requires long simulation times. For this reason, we simplify our alphabet often to just three symbols \(m, u, n\). Consequently, the success probability becomes \(p=1/3\), and \(q=2/3\).

2. First-step analysis

We first develop a model for the \(mmm\) sequence. We start in some state \(0\); let \(X_{i}\) be the \(i\)th letter typed. If \(X_{1}\neq m\), which happens with probability \(q=25/26\), the monkey doesn't make any progress and it remain in state \(0\). If, on the other hand, \(X_{1}=m\), which occurs with probability \(p=1-q=1/26\), the monkey moves to state \(m\). When in state \(m\), and \(X_{2}\neq m\), it returns to state \(0\); if \(X_{2}=m\), it moves to state \(mm\). Finally, in state \(mm\), if \(X_{3}\neq m\), it returns to state \(0\); otherwise it moves to state \(mmm\) and the game stops. Thus, more generally, the monkey is interested in the stopping time \(\tau = \inf\{n \geq 3 : X_{n-2} = X_{n-1} = X_n = m\}\).

As we only have to check the last three key strokes, it suffices to take as state space \(S = \{0, m, mm, mmm\}\). For notational ease, we often label these states as \(\{0, 1, 2, 3\}\), where the number represents the number of consecutive \(m\)'s, and when the monkey reaches state \(3\) it achieves its goal.

When we are in state \(i\), \(i\in \{0, 1, 2, 3\}\), let \(T(i)\) be the expected time to hit the final state state. Obviously, \(T(3)=0\) since state \(3\) corresponds to \(mmm\). What can we say about \(T(0), T(1)\) and \(T(2)\)? 1

Explain that the expected times have to satisfy the system of equations:

\begin{align*} T(0) &= 1 + q T(0) + p T(1), \\ T(1) &= 1 + q T(0) + p T(2),\\ T(2) &= 1 + q T(0) + p T(3). \end{align*}

To move from one state to the next, the monkey always types one key. Then, depending on the outcome \(X\), we move to a certain state. Consider the first equation. When in state \(0\), and the monkey does not hit \(m\) we have to start again, hence, we remain in state \(0\). Otherwise, when the monkey types the good key, i.e., \(X=m\), we progress with one extra \(m\). Similar reasoning gives the other expressions for the other states. Once state \(3\) is hit, we stop.

Clearly, we obtain a set of equations by considering for each state what can happen in the next step; this process is called first-step analysis. Once we have this system, it remains to solve it by hand or numerically. Now this system can be solved by hand with a little effort, but knowing how to tackle this with numerical tools is much more useful.

Explain that the above system can be written in matrix form as \(x = b + P x\), with

\begin{align*} P &= \begin{pmatrix} q & p & 0\\ q & 0 & p\\ q & 0 & 0\\ \end{pmatrix}, & b &= \begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix}, & x &= \begin{pmatrix} T(0) \\ T(1) \\ T(2) \end{pmatrix}. \end{align*}

The matrix \(P\) is known as the transition matrix.


Since \(T(3)=0\), we don't have to include that in the system of equations. The other equations are a one-to-one copy of the the system.

Numerical solvers work with linear systems of the type \(A x = b\) for a matrix \(A\) and a vector \(b\). Thus, we rewrite \(x = b + P x\) to \((I-P) x = b\), where \(I\) is the identity matrix.

import numpy as np
from numpy.linalg import solve

p = 1.0 / 3  # Probility to hit m
q = 1 - p  # Probability to hit u of n

b = np.ones(3)
P = np.array([[q, p, 0], [q, 0, p], [q, 0, 0]])
A = np.eye(3) - P  # Eye gives the identity matrix
T = solve(A, b)
T=array([39., 36., 27.])

We conclude that it takes \(T(0)=39\) keys for the monkey to arrive at \(mmm\) when \(p=1/3\).

We follow the same line of reasoning for the \(mum\) case.

Now the states are \(\{0, m, mu, mum\}\). Why?


We move from \(0\) to \(m\), just like earlier. In state \(m\), if the next key hit gives a \(u\), we move to \(mu\). Once in \(mu\), hitting \(m\) brings us to \(mum\).

We use the same notation for the expected times \(T(0), T(1)\) and \(T(2)\) as in the \(mmm\) case; \(T(3)=0\) is both problems.

Explain that for the \(mum\) sequence the expected times have to satisfy:

\begin{align*} T(0) &= 1 + q T(0) + pT(1),\\ T(1) &= 1 + (1-2p)T(0) + p T(1) + p T(2),\\ T(2) &= 1 + q T(0). \end{align*}

Consider the second equation, corresponding to state \(0\), i.e., state \(m\). Hitting the key \(m\) brings us back to state \(m\), key \(u\) brings us to \(mu\); other keys make us return to \(0\). The probability to hit \(m\) or \(u\) is \(p\), thus the probability to return to \(0\) is \(1-2p\). In state \(mu\), if the monkey hits \(m\), we move to \(mum\), otherwise, we go back to \(0\) again.

While the vector \(b\) is the same for both problems, the (transition) matrix \(P\) for the \(mum\) case becomes:

\begin{align*} P &= \begin{pmatrix} q & p & 0\\ 1-2p & p & p\\ q & 0 & 0\\ \end{pmatrix}. \end{align*}
P = np.array([[q, p, 0], [1 - 2 * p, p, p], [q, 0, 0]])
A = np.eye(3) - P
T = solve(np.eye(3) - P, np.ones(3))
T=array([30., 27., 21.])

Apparently the number of keys is \(T(0)=30\) to get to \(mum\). This is quite a bit smaller than the \(39\) expected strokes to reach \(mmm\).

Compare the two matrices \(P\) between the two cases to provide an intuitive explanation for this difference, i.e., \(30\) versus \(39\).


For the \(mum\) case, when in state \(m\) we can move on to \(mu\), but also move to state \(m\) again (by typing \(m\)). Thus, with probability \(p\) we don't fall back entirely to state~\(0\). In the \(mmm\) case, we either return to \(0\) or move on.

In the numerical example, our monkey had just three keys to hit. Adapt the code to show that \(T(0)=17602\) when the monkey has the original keyboard with 26 keys at its disposal.


Just change \(p=1/3\) to \(p=1/26\) in the code and run it.

Here is a related fun problem to analyze with first-step analyis. A mouse sits on one corner of a cube whose edges are made of wire. There is some chese diagonally opposite the mouse. One any corner, the mouse chooses at uniformly any edge, and moves to the next corner. When traversing an edge takes one minute, what is the expected time for the mouse to reach the cheese? 2

Show that the expected time is \(10\) minutes.


The mouse starts at the far side of the cheese. Thus, it takes at minimum three edges to get to the cheese.. Let the expected time to hit the cheese be given by \(T(3)\). Now the mouse takes one edge, and then it has at least two edges to travel, which takes expected time \(T(2)\). \(T(1)\) is defined similarly. A tiny bit of though shows that the times should satisfy the relations:

\begin{align*} T(3) &= 1 + T(2), & T(2) &= 1 + T(1) \frac 2 3 + T(3)\frac 13, & T(1) &= 1 + T(2)\frac 23. \end{align*}

Solving gives \(T(1)= 7, T(2)=9, T(3) = 10\).

3. Estimating hitting times with simulation

We can also use simulation to estimate the hitting times for words \(mmm\) and \(mum\). We do this in two steps. First we consider some ideas that seem correct, but turn out to be wrong. The explanation of why this is so, is interesting in itself, and provides another view on why it takes more keys to get to \(mmm\) than to \(mum\). Once we figured out what is wrong, we'll develop the correct code. Note from the last exercise above that it takes \(17602\) keys on average to reach \(mmm\) when \(p=1/26\). As this requires much more simulation effort than to estimate the same hitting time with \(p=1/3\), we consider just the keys \(m, n, u\)$ instead of the full alphabet.

Here is an initial idea to estimate \(T(0)\). Generate a large string of \(N\) iid rvs on \(\{0, 1, 2\}\). (Once again, we use just three outcomes to speed up the computation.) Then, count how often the sequences \((0,0,0)\), which codes for \(mmm\), and \((0, 1, 0)\), which codes for \(mum\), occur. If \(N_{m}\) counts \(mmm\) and \(N_{u}\) counts \(mum\), then it might seems that \(N/N_{m}\) must be a good estimator for \(T(0)\) for the \(mmm\) problem, and likewise, \(N/N_{u}\) for the \(mum\) problem.

It is not hard to write code to count the number occurrences of sub string within a string. However, this is such a common problem in computer science that people wrote very efficient algorithms for this. So we use regular expressions instead of doing the work ourselves. (I used ChatGPT to construct the regular expressions.)

import re

gen = np.random.default_rng(1)

N = int(1e6)
L = ''.join(gen.choice(list("mnu"), size=N))
Nm = len(re.findall(r'(?=mmm)', L))
Nu = len(re.findall(r'(?=mum)', L))
print(f"{N/Nm=:2.2f}, {N/Nu=:2.2f}")
N/Nm=27.03, N/Nu=26.89

This results is strange at first sight. Why are both results nearly equal to 27, while from the above we know that \(T(0)\) is 39 for the \(mmm\) case and \(30\) for the \(mum\) case?

Why is doesn't the above counting algorithm provide an estimator for \(T(0)\) for the \(mmm\) case?


The probability to see \(mmm\) at an arbitrary location in a string with characters \(m\), \(n\) and \(n\) is \((1/3)^3=1/27\); and this is the same for \(mum\). Note that this explains the simulation results above: when success occurs with probability \(1/27\), the expected time between two successes is \(27\).

But then, if the probabilities to see \(mmm\) and \(mum\) in a long strong of letters are the same, why then does it take longer to see \(mmm\) than \(mum\) when we start from state \(0\)? This is somewhat puzzling; we need a new idea to make progress.

Suppose that in the long array of letters, we saw the pattern \(mmm\), at positions \(i-2, i-1, i\). Let the monkey type the key for the \(i+1\)th time. When \(X_{i+1}\neq m\), we have to start all over again, while if \(X_{i+1}=m\) we get the string \(mmmm\), i.e., \(4\) times an \(m\) in row, of which the last three also forms\(mmm\). Hence, when \(X_{i+1}=m\) we immediately have another occurrence of the pattern \(mmm\).

Write \(T\) for the expected time between two occurrences of the pattern \(mmm\). Why does\(T\) satisfy the relation

\begin{equation} \label{org31730ec} T = 1 + q T(0), \end{equation}

where \(T(0)\) is the expected time to hit \(mmm\)?


Suppose we are in state \(mmm\), and the monkey hits a key \(X\). Clearly \(\E{T|X=m} = 1\), that is, if the next key is an \(m\), the time \(T\) to see \(mmm\) again is just this one key. However, \(\E{T | X \neq m} = 1 + T(0)\). To see this last equation, if \(X_{i}\neq m\) we start anew and it takes an expected number of keys \(T(0)\) keys to hit \(mmm\) again, and the 1 counts the key hit by the monkey. Combining this with the law of total expectation, \(T = p\E{T|X=m} + q \E{T|X\neq m}\), gives the result.

Check the numerical results we obtained above to see that \eqref{org31730ec} indeed is satisfied for \(mmm\).


Recall that \(T=27\), \(q=2/3\) and \(T(0)= 39\) for \(mmm\) string. Indeed, \(27 = 1 + 39\cdot 2/3 = 1+ 26\).

We next consider the \(mum\) case.

For the \(mum\) sequence, explain that \(T = 1+(1-2p) T(0) + p T(1) + p T(2)\), and check the numbers for the example.


Starting from \(mum\), when \(X_{}=m\) we move to string \(mumm\) which corresponds to state \(m\). When \(X_{}=u\), we get string \(mumu\) which is state \(mu\). Only after \(X_{i} = n\) we return to \(0\).

To check, note that \(T=27, p = 1/3, T(0)=30, T(1)=27, T(2)=21\). Filling it in: \(27=1+30/3 + 27/3 + 21/3 = 1 + 10+9+7\).

And now it is time to reconcile all the above.

Explain why stopping to throw a die as soon as a winning pattern emerges does not necessarily lead to equal winning chances?


The pattern \(mum\) has more potential to build up from previous outcomes. Therefore it can appear sooner. However, once we have seen \(mmm\), it can appear within one extra throw again, while \(mum\) can never recur after one key stroke. These two effects balance out for \(mum\) and \(mmm\), so that \(T=27\) for both sequences.

Now we know this, we also understand why the children need not have the same winning probabilities if we use hitting times of different types of outcomes (compare \((1,6)\) to \((1,1)\) for example) to determine which child should win the cookie.

4. Summary

By studying the above toy problem we covered many nice and useful ideas. With the typing monkey, we observed the subtle fact that the expected time to first hit a pattern need not be equal to the average time between such a pattern occuring in the long run. The model and methods of this section can be formulated in terms Markov chains, which form a very useful and powerful framework to analyze probability problems.



There is a subtle point. For the argument to work the expected times \(T_{i}\) should be finite. We will assume that this is the case.


We assume that the cheese does not move. A really interesting extension is to suppose that the cheese is also moving.