Finite State Machines (FSMs) or finite state automata (FSA) are models of computation that describe systems with a finite number of states and transitions between them, reacting to inputs and changing their state accordingly. As technology continues to advance, the future of finite state machines (FSMs) will evolve alongside emerging fields such as AI, ML, and Internet of things (IoT for short). Finite state machines will be integrated with AI and ML algorithms to create more intelligent and adaptive systems. Additionally, with the widespread adoption of IoT devices, FSMs will be crucial in coordinating and controlling interconnected systems, optimizing resource allocation, and enabling seamless interactions among devices and services.

Finite State Automata

A finite state automaton (plural: automata; FSA for short) or finite-state machine (FSM) is a discrete model of a computation. There are several known types of finite-state automata and all share the following characteristics.

The automaton has a finite set of states, one of which is distinguished as an initial state; FSA can "consume" input strings, and has a transition function that determines how the state of the automaton changes on reading each letter of the input. Finally, it has the ability to decide, once the last letter was read (and based on the final state), whether the input string is admissible or not. The collection of all the strings that are admissible is the language recognized by the automaton.

Despite its simplicity, it is surprising how many questions (and algorithms) can be adequately modeled by finite state automata. Examples abound in texts on the theory of computation ( see Hopcroft and Sudkamp books). We shall not discuss them here since our application of the automata is for combinatorial purposes only.

Finite-state machines are of two types—deterministic finite-state machines and non-deterministic finite-state machines. Most of our needs here are satisfied by the simplest type of automaton, the deterministic finite automaton, or DFA. The word deterministic signifies that the result of the DFA reading a letter 𝑎 ∈ Σ is a change to one known state. Another type of automaton, the NFA (for nondeterministic) allows multiple alternatives. Both varieties have the same descriptive power, and each NFA with n states can be "reduced'' to an equivalent DFA (in the worst case, however, the smallest equivalent DFA can have 2ⁿ states).

A deterministic finite automaton DFA is a quintuple M = (V, Σ, δ, s, F), where
V --- a finite set of states;
Σ --- an alphabet;
δ --- transition function δ : V × Σ → V;
s --- the initial state (before input is read);
F ⊆ V --- the set of accepting states (also called final states).

A nondeterministic finite-state automaton (NFA) is a quintuple M = (V, Σ, δ, s, F) that consists of the same elements as DFA. The only difference between an NFA and a DFA is in the type of value that δ returns: δ : V×Σ → 2^V (set of all subsets of V). So δ returns a subset of V (which may be empty) for every input state and input symbol.

It is convenient to consider the extended transition function δ^✶ : V × Σ^✶ → V. The second argument of δ^✶ is a string, rather than a single symbol, and its value gives the state the automaton will be in after reading that string. This function can be defined recursively: δ^✶(q, ε) = q for any state q. Suppose w is a string of the form w = x𝑎, where 𝑎 is the last symbol of w, and x is its prefix. Then δ^✶(q, x𝑎) = δ(δ^✶(q, x), 𝑎). The same definition holds for both DFA and NFA.

It is customary to represent finite automata either as tables or as figures with circles for states and arcs for transitions.

A string x ∈ Σ^✶ is accepted by a DFA (or an NFA) M = (V, Σ, δ, s, F) if and only if δ^✶(s, x) = p for some state p ∈ F. The language accepted or recognized by M is denoted as L(M), and it is given by \[ L(M) = \{ \,a\ | \ \delta^{*} (s,a) \in F\,\}. \]

Example 0: Let’s consider an elevator system with four floors: Ground, First, Second, and Third. When a user presses a button inside the elevator, the desired floor can be reached in different ways. For instance, if the current state is ‘Ground’ and the user presses the button for the third floor, the elevator can take different paths, such as stopping at the first floor and then on the second floor before finally reaching the third floor. This example illustrates a non-deterministic FSM since multiple transitions are possible for the same input and state.

nfaPlt = Rotate[ResourceFunction["NFAPlot"][ { {G, 1} -> {F1}, {G, 2} -> {F1, F2}, {G, 3} -> {F1, F3}, {F1, 2} -> {F2}, {F1, 3} -> {F2, F3}, {F2, 3} -> {F3} }, G, {F1, F2, F3}, "StateLabelSize" -> Medium ], 90 Degree ]

(*Export[FileNameJoin[{NotebookDirectory[],"ElevatorNFA.png"}],nfaPlt,\ "PNG"];*)

Elevator NFA

This graph visually represents the states and the transitions of an NFA, highlighting the non-deterministic nature where from state q_G, it can transition to multiple states based on different inputs. This is common in NFAs where multiple transitions can occur from the same or different states with the same input symbol. The graph illustrates the nondeterministic nature of the automaton. ■

End of Example 0

Waiting Time Probabilistic Problems

Finite state automata have multiple applications in string matching. For example, UNIX/Linus operating systems use the command grep. Grep, short for “global regular expression print”, is a command used for searching and matching text patterns in files contained in the regular expressions. Furthermore, the command comes pre-installed in every Linux distribution.

Example 1: Let M be the deterministic finite automaton (V, Σ, δ, s, F), defined as follows. Such automata can be characterized as "language recognition devices,'' or models. They have the ability, mentioned above, to determine whether a string is in or out of the language generated by the automaton.

V = (v₁, v₂, v₃),
Σ = {0, 1},
s = v₁,
F = (v₁, v₂),
δ(v₁,0) = v₂     δ(v₁,1) = v₁,
δ(v₂,0) = v₃     δ(v₂,1) = v₁,
δ(v₃,0) = v₃     δ(v₃,1) = v₃.

Note that in the diagram, states v₁ and v₂ are the accepting states. If this DFA is presented with the input string 011010011, it will go through the following states: v₁, v₂ v₁, v₁, v₂, v₁ v₂, v₃, v₃, v₃.

Hence, the string is not admissible because it is not in the language accepted by this automaton. The language it accepts, as a careful look at the diagram reveals, is the set of all binary strings that contain no successive zeroes.

Now we employ Mathematica to define and plot the corresponding atomation.

states = {v1, v2, v3};

Define the transition function as a list of directed edges with labels.

edges = { v1 -> v2, v1 -> v1,(* Transitions from v1 *) v2 -> v3, v2 -> v1,(* Transitions from v2 *) v3 -> v3, v3 -> v3 (* Transitions from v3 *) };

Define the accepting states.

acceptingStates = {v1, v2};

Custom vertex shape function for double circles.

vertexShapeFunction = Function[{pos, name}, If[MemberQ[acceptingStates, name], {EdgeForm[Black], Disk[pos, 0.15], Disk[pos, 0.13]}, {EdgeForm[Black], Disk[pos, 0.15]} ] ];

Create the DFA graph with customizations

dfaGraph = Rotate[Graph[ edges, VertexStyle -> {v1 -> LightBlue, v2 -> LightBlue, v3 -> LightGray}, EdgeStyle -> Directive[Thick],(* Set the edge thickness *) VertexSize -> Large,(* Increase vertex size *) VertexShapeFunction -> vertexShapeFunction,(* Use custom shape function *) GraphLayout -> "LayeredDigraphEmbedding", Epilog -> { Rotate[Text["v1", Scaled[{0.5, 0.85}]], -90 Degree],(* Label for node v1 *) Rotate[Text["v2", Scaled[{0.5, 0.48}]], -90 Degree],(* Label for node v2 *) Rotate[Text["v3", Scaled[{0.5, 0.12}]], -90 Degree],(* Label for node v3 *) Rotate[Text["1", Scaled[{0.52, 0.93}]], -90 Degree],(* Label for v1 -> v1 *) Rotate[Text["0", Scaled[{0.35, 0.7}]], -90 Degree],(* Label for v2 -> v1 *) Rotate[ Text["1", Scaled[{0.65, 0.7}]], -90 Degree],(* Label for v1 -> v2 *) Rotate[Text["1", Scaled[{0.45, 0.3}]], -90 Degree],(* Label for v2 -> v3 *) Rotate[Text["0.1", Scaled[{0.25, 0.11}]], -90 Degree],(* Label for v3 -> v3 *) Rotate[Text["0.1", Scaled[{0.75, 0.11}]], -90 Degree],(* Label for v3 -> v3 *) } ], 90 Degree ]

Export[FileNameJoin[{NotebookDirectory[], "DFAGraph1.png"}], dfaGraph, "PNG"];

■

End of Example 1

In this section, we concentrate our attention on probabilistic problems and show by examples how some problems can be solved in a straightforward way if we apply the finite state machine approach. As a side affect, since we obtain directly the probability generating functions of the random variables involved, we have direct access to the moments without actual knowledge of the distributions. The demonstrated technique is especially beneficial for solving probabilistic waiting time problems when the object of interest is a specific sequence of runs, called the patterns.

In the second part of this section, we consider problems with two stopping patterns, which could be extended for multiple choices. Such kind of problems are usually solved using advanced techniques---the methods of Markov chains (see next section), martingales, or multiple gambling teams (see Feller's book and Pozdnyakov's article). Instead, we show that techniques based on finite state machine approach are adequate for solving such kind of problems.

Example 2: Let us consider a classical problem by flipping a coin repeatedly until three (this problem can be generalized and solved for any r ≥ 2, r ∈ ℕ) consecutive head are obtained. Let X be the number of flips required to achieve three sequential head and we want to find its expected value, that is, the average number of tosses needed to see three head in a row.

To solve this problem, we need a mathematical model---a sample space that consists of all possible outcomes. We list the outcomes in rows according to the number of tosses needed to get three sequential consecutive head, after that the process is terminated. For instance, HTHHH is such outcome for five flips, where H stands for head and T---for tail. Since there is no (fixed) number that assures that the number of trials does exceed it, the sample space is infinite---it contains all finite strings over the alphabet { T, H } ending with three head. There are infinitely many outcomes, but there are no outcomes that represent infinitely many trials.

A finite state automaton (FSA) or machine is an excellent tool to handle such kind of problems. We consider the transition diagram for the FSA accepting all strings with three consecutive head. Let us denote by p the probability to get head, and q = 1- p to get tail.

RJB

Note that the state v₃ is the accepting state and the game is over since we obtain the required number of head. Let d_i (i = 1, 2, 3) be the expected duration of the game from any state i, and d = d₀ be the expected value of X, which corresponds to the duration of the game from the start vertex v₀. We denote by p_ik the probability of getting from state i to state k in one step. Then for every i = 0, 1, 2, 3, we get the equation:

\begin{equation} \label{E923.1} d_i = 1 + \sum_k p_{ik} d_k , %\qquad\mbox{with the boundary condition} \quad d_3 =0. \end{equation}

subject to the boundary condition d₃ = 0. Setting i = 0, we obtain d = 1 + qd + pd₁, so pd₁ = d(1-q) - 1 = pd - 1. For i = 1 we get d₁ = 1 + qd + pd₂, so pd₂ = d₁ - 1 - qd = d - 1/p - 1 - qd = pd - 1 - 1/p. Similarly, for i = 2 we have d₂ = 1 + qd₁ + pd₃ = 1 + qd₁ (since d₃ = 0). Hence pd₂ = p + pqd₁ = p + q(pd - 1). Equating two expressions for pd₂, we obtain the expected value to be

\[ d = E[X] = \frac{1}{p} + \frac{1}{p^2} + \frac{1}{p^3} \, ; \]

and for a true coin we expect 14 flips on average to see three consecutive consecutive head. In general, the expected number of flips required to see r consecutive head is

\begin{equation} \label{E92expect} \frac{1}{p} + \frac{1}{p^2} + \frac{1}{p^3} + \cdots + \frac{1}{p^r} = \frac{1}{qp^r} - \frac{1}{q} = \frac{1-p^r}{qp^r} . \end{equation}

You are asked to prove it in Exercise 1. The finite state machine approach also helps us to find the enumerator (generating function) for this problem. We can arrive to the start state, v₀, either from itself (with probability 1), or from v₀ on flip T (with probability q), or from the state v₁ on flip T (with probability q), or from the state v₂ on flip T. We can come to the state v₁ only from the state v₀ on flip H with probability p. The states v₂ and v₃ can be reached from previous states on a flip H with probability p. Symbolically, we can write it in the following form:

\begin{align*} v_0 &= 1 +Tv_0 + Tv_1 + Tv_2 , \quad v_1 = Hv_0 , \\ v_2 &= Hv_1 =HHv_0 , \quad v_3 = Hv_2 = HHHv_0 . \end{align*}

These equations can be rewritten as a single equation

\[ v_0 = 1 +Tv_0 + THv_0 + THHv_0 , \]

from which we find v₀ = (1 − T − TH − THH)⁻¹. Now the final state, v₃ = HHHv₀, is expressed (formally) as $ \displaystyle \quad v_3 = \frac{HHH}{1-T-TH-THH} . $ Remembering that the event H can occur with probability p and the event T has the probability q = 1 − p, the generating function for the event v₃ would be

\[ G(z) = \frac{p^3 \,z^3}{1-qz-qpz^2 -qp^2 z^3} \]

and when p = q = ½, the generating function becomes

\[ G(z) = \frac{z^3}{8-4z-2z^2 -z^3}\, . \]

Since the derivative of G(z) is

\[ G' (z) = \frac{3p^3 \,z^2}{1-qz-qpz^2 -qp^2 z^3} + \frac{p^3 z^3 (q+2qpz +3qp^2 z^2 )}{(1-qz-qpz^2 -qp^2 z^3 )^2} , \]

Clear[p, G];
G[z_] = p^3 z^3 /(1 - (1-p)*z - p*(1-p)*z^2 -(1-p)*p^2 * z^3);
D[G[z], z]

-((p^3 z^3 (-1 + p - 2 (1 - p) p z - 3 (1 - p) p^2 z^2))/(1 - (1 - p) z - (1 - p) p z^2 - (1 - p) p^2 z^3)^2) + (3 p^3 z^2)/( 1 - (1 - p) z - (1 - p) p z^2 - (1 - p) p^2 z^3)

we find the expected number of flips to see 3 consecutive head in a row to be

\[ E[X] =G' (1) = 3 + \frac{q(1+2p+3p^2 )}{p^3} = \frac{1}{p} + \frac{1}{p^2} + \frac{1}{p^3} \, . \]

FullSimplify[D[G[z], z] /.z->1]

(1 + p + p^2)/p^3

Taking the second derivative and setting z = 1, we get \[ E[X(X-1)] =G'' (1) =2\,\frac{1+2p+3p^2 -p^3 -p^4 - p^5}{p^6} . \]

FullSimplify[D[G[z], z, z] /. z -> 1]

(2 - 2 p (-2 + p (-3 + p + p^2 + p^3)))/p^6

Expand[2 - 2 p (-2 + p (-3 + p + p^2 + p^3))]

2 + 4 p + 6 p^2 - 2 p^3 - 2 p^4 - 2 p^5

Hence, the variance becomes \begin{align*} V[X] &= E[X(X-1)] +E[X] - \left( E[X]\right)^2 \\ &= E\left[ X^2 \right] - \left( E[X]\right)^2 \\ &= G'' (1) + G' (1) - G' (1)*G' (1) \\ &= \frac{1}{p^6} + \frac{2}{p^5} + \frac{3}{p^4} - \frac{3}{p^3} -\frac{2}{p^2} - \frac{1}{p}. \end{align*}

FullSimplify[ (1 + p + p^2)/p^3 - ((1 + p + p^2)/p^3)^2 + D[G[z], z, z] /. z -> 1]

-(((-1 + p) (1 + p (3 + p (6 + p (3 + p)))))/p^6)

Expand[-(((-1 + p) (1 + p (3 + p (6 + p (3 + p)))))/p^6)*p^6]

1 + 2 p + 3 p^2 - 3 p^3 - 2 p^4 - p^5

For a fair coin (p = ½), the variance is 142 since E[X] =14.

-(((-1 + p) (1 + p (3 + p (6 + p (3 + p)))))/p^6) /. p -> 1/2

142

Wolfram code summarizes the above calculations. Start by clearing existing definitions

Clear[p, v0, v1, v2, v3, eqns]

Probability for heads and tails

p = 1/2; (* Probability of getting a HEAD *) q = 1 - p; (* Probability of getting a TAIL *)

State transition equations as a linear system

eqns = { v0 == 1 + p v0 + q (v1 + v2), v1 == p v0, v2 == p v1, v3 == p v2 };

Solve the system of equations

solutions = Solve[eqns, {v0, v1, v2, v3}]

{ v0 → 8 v1 → 4 v2 → 2 v3 → 1 }

To calculate the variance involves computing the derivatives of the generating function.

G[z_] := (p^3 z^3)/(1 - q z - q p z^2 - q p^2 z^3)

The first derivative is the expected mean

expectedMean = D[G[z], z] /. z -> 1

The variance requires the second derivative

(D[G[z], {z, 2}] /. z -> 1) + expectedMean - expectedMean^2

142

■

End of Example 2

We show another approach to waiting time problems based on the analysis of pattern overlapping. Our exposition follows Bloom & Thorburn's article with the objective to show another derivation of the probability generating function for the waiting time of a stopping pattern.

Let us choose letters at random from an alphabet of n letters, one at a time. We continue sampling until the last m letters form a given pattern or stopping string. The number of trials to see the pattern is called the waiting time. For instance, we flip a coin until we see the pattern P = HHH; or we roll a die repeatedly until we get (1,2,3). More generally, how many times do you need to invoke a random number generator to observe the given sequence of numbers?

Let P be a pattern---a given sequence of letters. We say that there is an overlap of length r if the last r letters of P match its first r letters. We set ε_r = 1 in this case and 0 otherwise. For example, the pattern HHH has 3 overlaps (so ε₁ = ε₂ = ε₃ = 1), but the stopping sequence (1,2,3) has no overlaps except itself (ε₁ = ε₂ = 0, ε₃ = 1). Let m be the length of the pattern, and n be the number of letters in the alphabet. In what follows, we assume that every letter is equally likely to be chosen. Next, we introduce the conditional probabilities

\[ q_r = \epsilon_{m-r} \,n^{-r} , \qquad r=0,1,2,\ldots , m-1, \]

that the pattern is between positions r+1 and r+m, given that the sample starts from the stopping sequence. Let X be the waiting time for the pattern. Its probability generating function is

\begin{equation} \label{E87.2} \phi(z) = E\left[ z^X \right] = \frac{1}{1+(1-z)\,d(n/z)} , \end{equation}

where $ \displaystyle \quad d(x) = \sum_{r=1}^m \epsilon_r \,x^r \ $ is the generating function of the overlaps. Let p_i = Pr[X = i] denote the probability that P occurs for the first time after exactly i letters were sampled (i ≥ m). We consider the event that P is a substring of the string of length i, but not necessarily for the first time. This event is the union of the following three mutually exclusive events: (1) P occurs for the first time after exactly i letters, with probability p_i. (2) P occurs for the first time after j letters and also after i letters, the distance (= the number of letters) between the two patterns being at least m. The two occurrences will then be independent and happen with the probability p_jn^−m. (3) P occurs for the first time after j letters and also after i letters, the distance between the two patterns being less than m. The probability is now p_j q_i-j. Such randomization leads to the relation \[ n^{-m} = p_i + n^{-m} \sum_{j=m}^{i-m} p_i + \sum_{j=i-m+1}^{i-1} p_j q_{i-j} . \] Multiplying by n^m zⁱ both sides of the last equation and summing over i, we get \[ \sum_i z^i = n^{m} \sum_i p_i \,z^i + \sum_i z^i \sum_{j=m}^{i-m} p_i + n^{m} \sum_i ,z^i \sum_{j=i-m+1}^{i-1} p_j q_{i-j} . \] Rearranging the terms, we rewrite it as

\[ n^{m} \sum_i p_i \,z^i \equiv n^{m} \,\phi (z) = \frac{z^m}{1-z} \sum_{j\ge 1} p_j (1-z^j ) - n^{m} \,\sum_{j\ge m} p_j z^j \sum_{k=1}^{m-1} q_k z^k . \]

Solving for ϕ(z), we obtain \eqref{E87.2}.

With the generating function at hand, we differentiate it and set z = 1 to obtain the mean, μ, and variance, σ², of the waiting time:

\begin{equation} \label{E87.3} \mu = \sum_{r=1}^m \epsilon_r n^r = d(n) , \qquad \sigma^2 = d^2 (n) + d(n) -2n\,d' (n) . \end{equation}

Example 3: To determine the average number of flips of a fair coin to see three consecutive consecutive head (the pattern HHH), we calculate the leading coefficients to be all 1's. In this case, d(x) = x + x² + x³, we get the probability generating function from Eq.(3) to be \[ \phi (z) = \frac{1}{1+(1-z)\left( \frac{2}{z} + \frac{4}{z^2} + \frac{8}{z^3} \right)} = \frac{z^3}{8-4z-2z^2-z^3} . \eqno{\bbx} \] ■

End of Example 3

So far we considered problems with one final state. In what follows, we discuss a wide class of problems about waiting times with multiple stopping patterns. Waiting time distributions generated by compound patterns are rather large and include, for example, geometric and negative binomial distributions. Waiting times distributions have been used in various areas of statistics and some applications in reliability, sampling inspections, quality control, DNA sequencing, and others.

Consider an experiment with only countable many outcomes when every trial is performed repeatedly. For instance, flipping a coin n times (n can be any positive integer), we obtain a binary sequence of two letters---H and T. We are looking for the expected number of tosses and probability till one of two given sequences of patterns observed first. Since our goal is not to present the general results of waiting time distribution (You can trace this subject in the article Pozdnyakov et al), but rather give an exposition of available tools, we encourage the reader to go with us over examples and then solve exercises.

Now we analyze the case when there are two or more stopping sequences, denoted by P₁, P}₂, … , P_k. For any pair of patterns, P_i and P_j, we introduce the so called leading numbers:

\[ \epsilon_r (i,j)= \begin{cases} 1, & \mbox{if there is an overlap of length} \ r, \\ 0, & \mbox{if there is no overlap of length} \ r. \end{cases} \]

The pair of integers (i,j) in ε_r(i,j) indicates that we compare last r letters in the pattern P_i with the first r letters of the pattern P_j, i,j = 1, 2, … , k. Note that ε_r(i,i) is the leading number for one pattern P_i.

Theorem 1: For given k patterns of the same length m from an alphabet with n letters, the mean waiting time and the stopping probabilities are \begin{equation} \label{E87.4} \mu = \left( \sum_{i=1}^k x_i \right)^{-1} , \qquad \pi_j = x_j \left( \sum_{i=1}^k x_i \right)^{-1} , \quad j=1, \ldots , k, \end{equation} where x₁, x₂, … , x_k satisfy the linear system of equations \begin{equation} \label{E87.5} \sum_{i=1}^k e_{i,j} x_i = n^{-m} , \qquad \mbox{with} \quad e_{i,j} \stackrel{\tiny def}{=} \sum_{r=1}^m \epsilon_r (i,j) n^{r-m} , \quad i,j=1,2,\ldots , k. \end{equation}

Its proof is based on constructing a renewal process, see article by Blom & Thorburn for detail.

Corollary 1: The stopping probabilities π_i are equal if and only if $ \displaystyle \quad \sum_{i=1}^k e_{i,j} =c \ $ or $ \displaystyle \quad \sum_{j=1}^k e_{i,j} =c \ $ for some constant c. In this case, the mean waiting time is μ = c n^m/k.

Example 4: We flip a fair coin repeatedly and consider the number of times it takes for particular triplets (TTH, or HHH etc.) to appear. If we flip it just three times then each of the eight triplets is equally likely to show up. But if we persist, and ask questions about time to first appearance of each pattern, or about the likelihood of seeing one pattern before another, then inequalities emerge. Not always, by symmetry we expect either of three head or three tail to be produced before the other, but it turns out that any of other triplets is likely to appear before these two. Every (i, j)th entry in the following 8×8-matrix contains the probability that triplet i occurs in a sequence of coin flips before triplet j. The exercise 3 asks you to verify these probabilities.

	HHH	HHT	HTT	HTH	THT	THH	TTH	TTT
HHH	---	½	⅖	⅖	5/12	⅛	3/10	½
HHT	½	---	⅔	⅔	⅝	¼	½	7/10
HTT	⅗	⅓	---	½	½	½	¾	⅞
HTH	⅗	⅓	½	---	½	½	⅜	7/12
THT	7/12	⅜	½	½	---	½	⅓	⅗
THH	⅞	¾	½	½	½	---	⅓	⅗
TTH	7/10	½	¼	⅝	⅔	⅔	---	½
TTT	½	3/10	⅛	½	⅖	⅖	½	---

■

End of Example 4

Example 5: On average, how many times do you need to roll a fair die to see a sequence of n identical numbers? To answer this question, we draw an appropriate DFA with $n+1$ states that counts the number of identical digits in a row, the length of the current run.

RJB

The DFA starts from state 0, to which it never returns. Being at state k, the DFA goes either to the next state k+1 with probability ⅙ if die's roll shows the same integer, or to the first state if the roll is different, with probability ⅚. This leads to the following difference equation: \[ d_k = 1 + \frac{1}{6}\,d_{k+1} + \frac{5}{6}\,d_{1} , \quad k=1,2,\ldots , n-1, \] subject to the boundary condition d_n = 0. Here we used Eq.{E923.1}, where d_k is the expected time (= number of rolls) to reach the absorbing state, n, from the state k. Solving for d₁, we obtain $ \displaystyle \quad d_1 = \frac{1}{5} \left( 6^n -6 \right) , $ so d₀ = d₁ + 1. ■

End of Example 5

Example 6: A while ago, pirates captured a man, whose fate was determined by their captain. The pirate was an avid gambler, and liked to use chance in all aspects of his life. The captive was an excellent computer scientist, and knew a lot about probability. The captain was going to gamble with prisoner's life! He had a die that has six blank faces on it. He told the captive that he is going to write L, I, V, E, and K on five of the faces and leave the sixth one blank. He would force the prisoner to roll the die until either LIVE or KILL comes up in a row.

Luckily, the captured man had a laptop with a software algebra system, and the prisoner was able to make use of it. The computer scientist spent a few minutes on the problem and asked the pirate to use letters on all faces (which would make the die more attractive). He suggested and the captain agreed to use letters L, I, V, E, D, and A on it, and let the deciding words to be LIVE or DEAD. What is the expected number of rolls for each of three words?

Solution. First, we draw a DFA that simulates appearance of the five letters. Note that solid lines indicate movements with probability ⅙ and dotted lines show transitions with different probabilities (mostly ½ by default, but some with ⅔). There are two accepting states.

RJB

Then we calculate the expected number of rolls to reach one of the final states. Denoting this number by d, we derive the following equation \[ d=1+ \frac{2}{3}\,d + \frac{1}{6}\, d_l +\frac{1}{6}\,d_k , \] where d₁ and d_k are expected numbers to reach the final states from states ``L'' and ``K,'' respectively. These expected values are also expressed via expected values to get to the final states from other states: \[ d_l = 1 +\frac{1}{6} \,d_l + \frac{1}{6} \,d_k + \frac{1}{6} \,d_{li} + \frac{1}{2} \,d , \qquad d_k = 1 +\frac{1}{6} \,d_l + \frac{1}{6} \,d_{k} + \frac{1}{6} \,d_{ki} + \frac{1}{2} \,d . \] Similarly, we obtain \begin{align*} d_{li} &= 1 +\frac{1}{6} \,d_l + \frac{1}{6} \,d_k + \frac{1}{6} \,d_{liv} + \frac{1}{2} \,d , \quad &d_{ki} &= 1 +\frac{1}{6} \,d_{kil} + \frac{1}{6} \,d_k + \frac{2}{3} \,d , \\ d_{liv} &= 1 +\frac{1}{6} \,d_l + \frac{1}{6} \,d_k + \frac{1}{2} \,d , \quad &d_{kil} &= 1 +\frac{1}{6} \,d_{li} + \frac{1}{6} \,d_k + \frac{1}{2} \,d . \\ \end{align*} Solving this system of equations, we obtain the expected number of rolls to finish the game: \[ d=\frac{279936}{431} \approx 649.50348 \quad\mbox{or about 650 rolls.} \] Now we consider the case when there is only one stopping pattern---either KILL or LIVE. Since they are similar, we draw only one DFA:

RJB

Solving the following system of equations \begin{align*} d&= 1+\frac{5}{6}\, d +\frac{1}{6}\, d_{l} , \quad &d_{l} &= 1 +\frac{1}{6}\, d_{l} +\frac{1}{6}\, d_{li} + \frac{2}{3}\, d , \\ d_{li} &= 1+ \frac{1}{6}\, d_{liv} + \frac{1}{6}\, d_{l} + \frac{2}{3}\, d , \quad &d_{liv} &= 1 + \frac{1}{6}\, d_{l} + \frac{2}{3}\, d , \end{align*} for d, the expected value to reach the final state LIVE, we obtain \[ d=1296 . \] The probability for the machine to be at a particular state (there are nine of them) depends on the number of rolls. However, their numerical values are not our immediate concern. Instead, we are looking for the case when the number of trials increases without bounds trying to evaluate the `"ultimate'' probability for infinite sequence of letters (if they are not terminated by two absorbing states). We denote by p_k, p_l, p_ki, p_li, p_kil, and p_liv the limiting ``probabilities'' to reach states ``K,'' ``L,'' ``KI,'' ``LI,'' ``KIL,'' ``LIV,'' respectively.

We can arrive to state ``K'' either from the starting state (with probability ⅙), or from the same state ``K'' (with probability ⅙), or from the state ``L'' (with probability ⅙), or from the state ``LI'' (with probability ⅙), or from the state ``LIV'' (with probability ⅙), or from the state ``KI'' (with probability ⅙), or from the state ``LIV'' (with probability ⅙). So we have the equation \[ p_k = \frac{1}{6} + \frac{1}{6} \, p_k + \frac{1}{6} \, p_l + \frac{1}{6} \, p_{ki} + \frac{1}{6} \, p_{kil} + \frac{1}{6} \, p_{li} + \frac{1}{6} \, p_{liv} . \] Similarly, we can reach the state ``L'' from the starting state and states ``K,'' ``LI,'' and ``LIV,'' all with the same probability ⅙. This observation we translate into the equation \[ p_l = \frac{1}{6} + \frac{1}{6} \, p_k + \frac{1}{6} \, p_l + \frac{1}{6} \, p_{li} + \frac{1}{6} \, p_{liv} . \] For other states we get similarly \[ p_{li} = \frac{1}{6} \, p_l + \frac{1}{6} \,p_{kil} , \quad p_{liv} = \frac{1}{6} \, p_{li} , \quad p_{ki} = \frac{1}{6} \, p_{k} , \quad p_{kil} = \frac{1}{6} \, p_{ki} . \] Hence, we need to find p_kil and p_liv from this system of algebraic equations because the probabilities to reach the final states are just ⅙ times these probabilities. When a CAS is available, it is not hard to find these probabilities to obtain \[ p_{kil} = \frac{216}{28339} , \quad p_{liv} =\frac{215}{28339} , \] Since the odds of the states KILL vs LIVE are 216 : 215, we get the following probabilities: \[ \Pr [\mbox{KILL}] =\frac{216}{215+216} = \frac{216}{431} \approx 0.5011600928, \quad \Pr [\mbox{LIVE}] =\frac{215}{431} \approx 0.4988399072 . \] Note that there KILL (or LIVE) is the event that a run of consecutive letters KILL (or LIVE occurs before a run of consecutive letters LIVE (or KILL).

The case LIVE vs DEAD can be treated in a similar way. However, we present its solution in the following example, using another technique.

The prisoner was able to convince the pirate to use his die idea by telling him that it would be much more elegant to have 6 full letters on the die, rather than 5 and a blank. The pirate agreed, and he started rolling. The result of 474 rolls was:

vleideedeilelvdllidlvevvdieeaididvldevelvadeeevieaivavidei
eiiiaaddaddivalaeaeeivvilvvlveavaldaeedeilivaveaveleealddl
veaeividevevlldadelddlivavvialvaleeeedaleailvlvvlvilalilai
aleallaevaidveaidddlladdlalilvideiladldeviivlvddadvlvvvlei
vevivllaveldvlvavvevilaiavidvvvlelleldevdldaleiivldveliiee
iialevaledvvlivvvdivdead

Even with a slight advantage, the computer scientist will still lose in the long run. Being an enterprising man, the pirate offered him one last chance to save his life. He noticed that the prisoner was an excellent computer scientist and he had some ideas for a start-up company. If the prisoner would work for 12 hours a day, 7 days a week, he would be spared and if the company was successful, he would be rewarded with 5% of all profits. The computer scientist was shocked! This was a better offer than where he was working before! ■

End of Example 6

Example 7: Now we go to the case LIVE vs DEAD and draw the diagram that has 2 transitions with probability ⅔ (Start ↦ START and DEA ↦ START), 5 transitions with probability ½ (L ↦ START, LI ↦ START, LIV ↦ START, D ↦ START, DE ↦ START), and 19 transitions with probability ⅙ (they are drawn with solid lines to make them distinguish from seven others drawn with dotted lines). From the DFA, we derive the following equations for expected values (similar to \eqref{E923.1}):

RJB

\begin{align*} d&= 1 +\frac{2}{3}\,d + \frac{1}{6}\,d_l + \frac{1}{6}\,d_d , \quad &d_d &= 1 + \frac{1}{6}\,d_l + \frac{1}{6}\,d_d + \frac{1}{6}\,d_{de} + \frac{1}{2}\,d , \\ d_l &= 1 + \frac{1}{6}\,d_l + \frac{1}{6}\,d_d + \frac{1}{6}\,d_{li} + \frac{1}{2}\,d , \quad &d_{de} &= 1 + \frac{1}{6}\,d_{dea} + \frac{1}{6}\,d_l + \frac{1}{6}\,d_{d} + \frac{1}{2}\,d , \\ d_{li} &= 1 + \frac{1}{6}\,d_{liv} + \frac{1}{6}\,d_l + \frac{1}{6}\,d_{d} + \frac{1}{2}\,d , \quad &d_{dea} &= 1 + \frac{1}{6}\,d_{l} + \frac{2}{3}\,d , \\ d_{liv} &= 1 + \frac{1}{6}\,d_{l} + \frac{1}{6}\,d_d + \frac{1}{2}\,d . \end{align*} Solving this system of algebraic equations, we obtain \[ d=\frac{281232}{433} \approx 649.4965358 \quad\mbox{or about 650 rolls.} \] To calculate the limited probabilities, we use the same approach as in the die with live/kill options. From the diagram, it follows \begin{align*} p_l &= \frac{1}{6} + \frac{1}{6}\left[ 1+ \frac{1}{6} + \left( \frac{1}{6} \right)^2 \right] (p_l + p_d ) , \\ p_d &= \frac{1}{6} + \frac{1}{6}\left[ 1+ \frac{1}{6} + \left( \frac{1}{6} \right)^2 \right] p_l + \frac{1}{6}\,p_d + \left( \frac{1}{6} \right)^2 p_d \end{align*} because p_li = ⅙ p_l, $ \displaystyle \quad p_{liv} = \frac{1}{6}\,p_{li} = \left( \frac{1}{6}\right)^2 p_l \quad $ and p_de = ⅙·p_d, $ \displaystyle \quad p_{dea} = \frac{1}{6}\,p_{de} = \left( \frac{1}{6}\right)^2 p_d . \quad $ Solving this system of algebraic equations, we obtain \[ p_l = \frac{7812}{28253} \quad \mbox{and} \quad p_d = \frac{7776}{28253} . \] Since p_live = p_l/36 and p_dead = p_d/36, we get \begin{align*} \Pr [\mbox{LIVE}] &= \frac{7812}{7812+7776} = \frac{7812}{15588} = \frac{217}{433} \approx 0.5011547344, \\ \Pr [\mbox{DEAD}] &= \frac{7776}{15588} =\frac{216}{433} \approx 0.4988452656 , \end{align*} where live is the event that the run of consecutive letters live occurs before a run of consecutive letters dead$\{ dead.

Now we calculate the expected number of rolls, denoted by $d$, to see the run of DEAD by solving the following system of algebraic equations: \begin{align*} d&=1+\frac{5}{6}\,d +\frac{1}{6}\,d_{D} , \quad &d_{D} &= 1+\frac{1}{6}\,d_{D} +\frac{2}{3}\,d + \frac{1}{6}\,d_{DE} , \\ d_{DE} &=1+\frac{1}{6}\,d_{D} +\frac{2}{3}\,d + \frac{1}{6}\,d_{DEA} , \quad &d_{DEA} &=1+\frac{5}{6}\,d . \end{align*}

RJB

■

End of Example 7

Find the expected number of flips of a biased coin to see r (r ≥ 3) consecutive head by deriving a corresponding enumerator.

Let M = { V, Σ, δ, v₀, F } be a DFA, where V = { v₀, v₁, v₂, v₃ }, Σ = {0, 1}, F = { v₃ }, and the transition function is given by the following table:

State	0	1
v₀	v₀	v₁
v₁	v₁	v₂
v₂	v₁	v₃
v₃	v₃	v₁

Which of the following strings are in L(M)?

(a)     ε
(b)     001101,
(c)     1110,
(d)     11011.

Find the mean waiting time and verify stopping probabilities for each pair of the eight triplets when flipping a fair coin in the table.

	HHH	HHT	HTT	HTH	THT	THH	TTH	TTT
HHH	---	½	⅖	⅖	5/12	⅛	3/10	½
HHT	½	---	⅔	⅔	⅝	¼	½	7/10
HTT	⅗	⅓	---	½	½	½	¾	⅞
HTH	⅗	⅓	½	---	½	½	⅜	7/12
THT	7/12	⅜	½	½	---	½	⅓	⅗
THH	⅞	¾	½	½	½	---	⅓	⅗
TTH	7/10	½	¼	⅝	⅔	⅔	---	½
TTT	½	3/10	⅛	½	⅖	⅖	½	---

A coin is tossed successively until either two consecutive head occur in a row or four tail occur in a row. What is the probability that the sequence ends with two consecutive head? What is expected number of tosses for this event?
A coin is tossed successively until one of the following patterns is observed: either a run of H's of length m or a run of T's of length n. We know from Example~\ref{M582.3} that the expected waiting time of the run of H's is 2^m+1 - 2, while for the run of T's it is 2ⁿ⁺¹ - 2. Find the expected number of tosses until either of these two runs happens for the first time and show that for a fair coin this number is \[ \left( \frac{1}{2^{m+1} -2} +\frac{1}{2^{n+1} -2} \right)^{-1} = \frac{2^{n+m} - 2^{n} - 2^{m} +1}{2^{n-1} + 2^{m-1} -1} \,. \]
Suppose you flip a fair coin without stopping, what is the average waiting time until the first occurrence of the pattern occurs? Write the enumerator for each event.
1. HHHHH,
2. HTHTH,
3. HHHTH,
4. HHHHT .
What is the probability that in a stochastic sequence of head (H) and tail (T) the pattern HHHHT appears earlier than HHHTH? Assume that tail lands with probability q.
(a) On average, how many times do you need to choose digits from the alphabet Σ = { 0, 1, … , 9 } (taken randomly with replacement) to see 4-long sequence (i, i, i, i), i ∈ Σ?
(b) A pseudo-random generator produces r-tuples of integers (i₁, i₂, … , i_r ) from the alphabet Σ = { 0, 1, … , 9 }. What is expected waiting time (measured in the number of calls of the random generator) for the sequence (i, i, … , i)? DFA:
The goal of this exercise is to show that some waiting time problems may have counterintuitive solutions. Given two sequences of patterns, HTHT and THTT, a coin is flipped repeatedly until one of the patterns appears as a run; what pattern win? Find the expected number of tosses needed to see a winner; also, determine the expected running time (=number of flips) for each of the patterns. As a check on your answers for a fair coin: the odds are 9 to 5 in favor of getting HTHT first, though its expected number of tosses till first arrival is 20, whereas the pattern THTT needs only 18 flips on the average.
Suppose, for instance, that A flips a coin until she observes either HHHTH or HTHTH, while B flips another coin until he observes either HHHHT or HHHTH. Here H stands for head and T---for tail. It is assumed that both coins have the same probability p of head on each trial. For what p the average waiting times for A and for B will be the same?

Blom, Gunnar and Thorburn, Daniel, How many random digits are required until given sequences are obtained?, Journal of Applied Probability, 19, 518 -- 531, 1982.
Feller, William, An Introduction to Probability Theory and its Applications, 3rd Ed., John Wiley & Sons, New York, 1968.
Hopcroft, John E., Motwasni, Rajeev, and Ullman, Jeffrey D., Introduction to Automata Theory, Languages and Computation, 3rd Ed., Addison Wesley, Boston, MA, 2007.
Pozdnyakov, Vladimir and Kulldorf, Martin, Waiting times for patterns and a method of gambling teams, The American Mathematical Monthly, 113, No. 2, 134 -- 143, 2006.
Sudkamp, Thomas A., Languages and Machines, An Introduction to the Theory of Computer Science, 3rd Ed., Addison-Wesley, Reading, MA, 2006.

Introduction to Linear Algebra

Systems of Linear Equations

Matrix Algebra

Vector Spaces

Eigenvalues, Eigenvectors

Euclidean Spaces

Matrix Decompositions

Tensors

Applications

Functions of Matrices

Miscellany

Preliminaries

Glossary

Reference

Finite State Automata

Waiting Time Probabilistic Problems