Theoretical Background of Nyanten (Efficient Computation of Shanten/Deficiency Numbers)

Last updated at 2025-01-18Posted at 2025-01-18

This Qiita entry explains the theoretical background of the Nyanten library, which efficiently computes the Shanten/deficiency number (向聴数) of a given hand.

Theoretical Background

First, we introduce the concept of the replacement number (置換数) instead of the shanten number (向聴数) or deficiency number (欠損数). For a given hand of tiles, the shanten/deficiency number is defined as the minimum number of self-draws required to transform that hand into a tenpai (ready) shape. Similarly, for a given hand, the replacement number is defined as the minimum number of self-draws required to transform that hand into a winning shape. Clearly, $\text{(shanten number)} = \text{(replacement number)} − 1$, so the calculations for shanten number and replacement number can be treated as essentially the same. The following discussion focuses on an efficient method for computing this replacement number.

Hereinafter, we will represent a given hand as a sequence of 34 natural numbers. The first element indicates how many 1-Man tiles (🀇) are in the hand, the second element indicates how many 2-Man tiles (🀈) are in the hand, and so on. In other words, a hand $\mathbf{h}$ is represented as $\mathbf{h} = (h_0, \dots, h_{33}) \in \mathbb{N}^{34}$.

Next, let $\delta(\mathbf{g}, \mathbf{h})$ denote the number of self-draws required to transform a hand $\mathbf{h}$ into a hand $\mathbf{g}$, and let $\mathbf{W}$ be the set of all winning hands. Then, the replacement number $r(\mathbf{h})$ for a hand $\mathbf{h}$ can be expressed as $r(\mathbf{h}) = \min_{\mathbf{w} \in \mathbf{W}} \delta(\mathbf{w}, \mathbf{h})$.

This Qiita entry will first discuss the properties of $r$. However, the expression for $r$ given above is not a definition that can be used in a mathematically rigorous argument, because we have not yet provided mathematically precise definitions for $\mathbf{W}$ or $\delta$. Therefore, we will begin by giving mathematically rigorous definitions of $\mathbf{W}$ and $\delta$.

Below, we begin by defining the set $\mathbf{W}$ of all winning hands. When considering calls (melds), the pure hand (兵牌, the hand excluding melds) in a winning shape always consists of $m$ members and $1$ pair ($m \in \lbrace 0, 1, 2, 3, 4 \rbrace$). For a given $m$, how can we enumerate all such shapes of $m$ members and 1 pair? One way is to look at each tile type and decide: “How many threes-in-a-row starting with that tile do we take out—0, 1, 2, 3, or 4?”, “How many triplets of that tile—0 or 1?”, and “How many pairs of that tile—0 or 1?”. We do this for all tile types. The result of these choices will be a shape with some members and some pairs. We then pick out only those with exactly $m$ members and $1$ pair. Of course, if we carry out these selections without any restrictions, we would violate the basic rules of Mahjong—namely, that there are only four copies of each tile. Therefore, we must impose constraints on how we perform this selection so as to adhere to the fundamental rules of Mahjong.

For example, suppose we take out $x$ threes-in-a-row starting with 1-Man (🀇), $y$ triplets of 1-Man, and $z$ pairs of 1-Man. We denote this choice by $(x, y, z)$. Then, due to the Mahjong rule that only four copies of 1-Man exist in total, and from the fact that there can be at most one pair in a winning hand, there are only 10 possible combinations of $(x, y, z)$ as described below.

(x, y, z)	Corresponding tiles
(0, 0, 0)	(No tile)
(0, 0, 1)	1m1m (🀇🀇)
(0, 1, 0)	1m1m1m (🀇🀇🀇)
(1, 0, 0)	1m2m3m (🀇🀈🀉)
(1, 0, 1)	1m1m1m2m3m (🀇🀇🀇🀈🀉)
(1, 1, 0)	1m1m1m1m2m3m (🀇🀇🀇🀇🀈🀉)
(2, 0, 0)	1m1m2m2m3m3m (🀇🀇🀈🀈🀉🀉)
(2, 0, 1)	1m1m1m1m2m2m3m3m (🀇🀇🀇🀇🀈🀈🀉🀉)
(3, 0, 0)	1m1m1m2m2m2m3m3m3m (🀇🀇🀇🀈🀈🀈🀉🀉🀉)
(4, 0, 0)	1m1m1m1m2m2m2m2m3m3m3m3m (🀇🀇🀇🀇🀈🀈🀈🀈🀉🀉🀉🀉)

Likewise, for 2-Man, 3-Man, and so on, we can consider exactly the same $(x, y, z)$ assignments. By assigning $(x, y, z)$ to every tile type, it becomes clear that we can enumerate all possible combinations of tiles composed of some number of members and some number of pairs. In other words, every combination of $m$ members and $1$ pair can be represented by assigning one of the ten $(x, y, z)$ patterns described above to each of the 34 tile types.

The following Definitions 1 and 2 formalize this idea.

Definition 1 (Decomposition Elements)

We define decomposition elements $d^{(0)}$, $d^{(1)}$, $d^{(2)}$, $d^{(3)}$, $d^{(4)}$, $d^{(5)}$, $d^{(6)}$, $d^{(7)}$, $d^{(8)}$, $d^{(9)}$ as follows:

$d^{(0)} \equiv (0, 0, 0)$,
$d^{(1)} \equiv (0, 0, 1)$,
$d^{(2)} \equiv (0, 1, 0)$,
$d^{(3)} \equiv (1, 0, 0)$,
$d^{(4)} \equiv (1, 0, 1)$,
$d^{(5)} \equiv (1, 1, 0)$,
$d^{(6)} \equiv (2, 0, 0)$,
$d^{(7)} \equiv (2, 0, 1)$,
$d^{(8)} \equiv (3, 0, 0)$,
$d^{(9)} \equiv (4, 0, 0)$.

We denote the set of all the decomposition elements as $D$.

Definition 2 (Concatenation of Decomposition Elements)

For a sequence of decomposition elements $\mathbf{d} = (d_0, d_1, \dots, d_{n - 1}) \in D^n$ and a decomposition element $d \in D$, we define an infix operator $\mathbf{d} \cdot d$ as $\mathbf{d} \cdot d \equiv (d_0, d_1, \dots, d_{n - 1}, d) \in D^{n + 1}$.

A hand only consisting of some members and some pairs can always be represented as a sequence of 34 decomposition elements. For example, an winning hand "1m1m1m1m2m2m2m2m3m3m3m3m4m4m" can be expressed as $(d^{(9)}, d^{(0)}, d^{(0)}, d^{(1)}, d^{(0)}, \dots, d^{(0)})$.

However, not every sequence of 34 decomposition elements necessarily constitutes a winning hand. This is because it can violate the following constraints from the rules of Mahjong:

In a winning hand, the number of members is at most four.
In a winning hand, the number of pairs is at most one.
No three-in-a-row may start with 8, 9, or honor tiles.
The number of copies of each tile in the hand must not exceed four.

The following Definitions 3 and 4 exclude the sequences of decomposition elements that correspond to hands violating the Mahjong rules.

Definition 3 (Concatenation of a Set of Sequences of Decomposition Elements)

For a set of sequences of decomposition elements $\mathbf{D} \subset D^n$ and a decomposition element $d \in D$, we define an infix operator $\mathbf{D} \circ d$ as $\mathbf{D} \circ d \equiv \lbrace \mathbf{d} \cdot d \mid \mathbf{d} \in \mathbf{D} \rbrace$.

Definition 4 (Prefixes of the Standard Winning Decompositions)

We define prefixes of the standard winning decompositions $\mathbf{W}'_{i, m, p, a, b} \subset D^i$ $(i, m, p, a, b \in \mathbb{N})$ as follows:

$\mathbf{W}'_{0, 0, 0, 0, 0} \equiv \lbrace () \rbrace$.
$\mathbf{W}'_{i, m, p, a, b} \equiv \emptyset$ if either of the following conditions is met:
- $i < 0$ or $i > 34$,
- $m < 0$ or $m > 4$,
- $p < 0$ or $p > 1$,
- $a < 0$ or $a > 4$,
- $b < 0$ or $b > 4$,
- $i \in \lbrace 0, 8, 17, 26, 27, 28, 29, 30, 31, 32, 33 \rbrace$ and $a \neq 0$, or
- $i \in \lbrace 0, 7, 8, 16, 17, 25, 26, 27, 28, 29, 30, 31, 32, 33 \rbrace$ and $b \neq 0$.
Otherwise:
- $\mathbf{W}'_{i + 1, m, p, a, 0} \equiv \text{(the sum of all the following sets)}$:
  - $\bigcup_{t = 0, 1, 2, 3, 4} \mathbf{W}'_{i, m, p, t, a} \circ d^{(0)}$
  - $\bigcup_{t = 0, 1, 2} \mathbf{W}'_{i, m, p - 1, t, a} \circ d^{(1)}$
  - $\bigcup_{t = 0, 1} \mathbf{W}'_{i, m - 1, p - 1, t, a} \circ d^{(2)}$
- $\mathbf{W}'_{i + 1, m, p, a, 1} \equiv \text{(the sum of all the following sets)}$:
  - $\bigcup_{t = 0, 1, 2, 3} \mathbf{W}'_{i, m - 1, p, t, a - 1} \circ d^{(3)}$
  - $\bigcup_{t = 0, 1} \mathbf{W}'_{i, m - 1, p - 1, t, a - 1} \circ d^{(4)}$
  - $\mathbf{W}'_{i, m - 2, p, t, a - 1} \circ d^{(5)}$
- $\mathbf{W}'_{i + 1, m, p, a, 2} \equiv \text{(the sum of all the following sets)}$:
  - $\bigcup_{t = 0, 1, 2} \mathbf{W}'_{i, m - 2, p, t, a - 2} \circ d^{(6)}$
  - $\mathbf{W}'_{i, m - 2, p - 1, t, a - 2} \circ d^{(7)}$
- $\mathbf{W}'_ {i + 1, m, p, a, 3} \equiv \bigcup_ {t = 0, 1} \mathbf{W}'_ {i, m - 3, p, t, a - 3} \circ d^{(8)}$
- $\mathbf{W}'_ {i + 1, m, p, a, 4} \equiv \mathbf{W}'_ {i, m - 4, p, 0, a - 4} \circ d^{(9)}$

Definition 5 (Set of m-Member p-Head Standard Winning Decompositions)

For $m \in \lbrace 0, 1, 2, 3, 4 \rbrace$ and $p \in \lbrace 0, 1 \rbrace$, we define the set of m-member p-head standard winning decompositions $\mathbf{W}'_ {m, p} \subset D^{34}$ as $\mathbf{W}'_ {m, p} \equiv \mathbf{W}'_ {34, m, p, 0, 0}$.

Definition 6 (Decomposition-to-Hand Function)

For a sequence of decomposition elements $\mathbf{d} = (d_0, d_1, \cdots, d_{33}) \in D^{34}$, we define the decomposition-to-hand function $f: D^{34} \to \mathbb{N}^{34}$ as $f(\mathbf{d}) \equiv (h_0, h_1, \cdots, h_{33})$, where:

if $i \in \lbrace 0, 9, 18, 27, 28, 29, 30, 31, 32, 33 \rbrace$ then $h_i \equiv d_{i, 0} + 3 d_{i, 1} + 2 d_{i, 2}$, otherwise
if $i \in \lbrace 1, 10, 19 \rbrace$ then $h_i \equiv d_{i - 1, 0} + d_{i, 0} + 3 d_{i, 1} + 2 d_{i, 2}$, otherwise
$h_i \equiv d_{i - 2, 0} + d_{i - 1, 0} + d_{i, 0} + 3 d_{i, 1} + 2 d_{i, 2}$.

Here, let us set $d_i = (d_{i, 0}, d_{i, 1}, d_{i, 2})$.

Definition 7 (Set of m-Member p-Head Standard Winning Hands)

For $m \in \lbrace 0, 1, 2, 3, 4 \rbrace$ and $p \in \lbrace 0, 1 \rbrace$, we define the set of m-member p-head standard winning hands $\mathbf{W}_ {m, p} \subset \mathbb{N}^{34}$ as $\mathbf{W}_ {m, p} \equiv f(\mathbf{W}'_ {m, p}) = \lbrace f(\mathbf{w}) \mid \mathbf{w} \in \mathbf{W}'_ {m, p} \rbrace$.

Definition 8 (Hand Distance)

For two hands $\mathbf{g} = (g_0, g_1, \dots, g_{33}) \in \mathbb{N}^{34}$ and $\mathbf{h} = (h_0, h_1, \dots, h_{33}) \in \mathbb{N}^{34}$, we define the distance from $\mathbf{h}$ to $\mathbf{g}$, denoted by $\delta(\mathbf{g}, \mathbf{h}): \mathbb{N}^{34} \times \mathbb{N}^{34} \to \mathbb{N}$, as follows:

\delta(\mathbf{g}, \mathbf{h}) \equiv \sum_{i = 0}^{33} \max(g_i - h_i, 0) .

Definition 9 (m-Member 1-Head Replacement Number)

For a given hand $\mathbf{h} = (h_0, h_1, \dots, h_{33}) \in \mathbb{N}^{34}$ and $m \in \lbrace 0, 1, 2, 3, 4 \rbrace$, we define the m-member 1-head replacement number of $\mathbf{h}$, denoted by $r_m(\mathbf{h})$, as follows:

r_m(\mathbf{h}) \equiv \min_{\mathbf{w} \in \mathbf{W}_{m, 1}} \delta(\mathbf{w}, \mathbf{h}) .

Definition 10 (m-Member 1-Head Shanten (Deficiency) Number)

For a given hand $\mathbf{h} = (h_0, h_1, \dots, h_{33}) \in \mathbb{N}^{34}$ and $m \in \lbrace 0, 1, 2, 3, 4 \rbrace$, we define the m-member 1-head shanten (deficiency) number of $\mathbf{h}$, denoted by $s_m(\mathbf{h})$, as follows:

s_m(\mathbf{h}) \equiv r_m(\mathbf{h}) - 1

Definition 11 (Color (Suit) Projection of a Hand)

For a given hand $\mathbf{h} = (h_0, h_1, \dots, h_{33}) \in \mathbb{N}^{34}$, we define the projection of $\mathbf{h}$ to each color (suit), denoted by $\mathbf{h}^{(0)}, \mathbf{h}^{(1)}, \mathbf{h}^{(2)}, \mathbf{h}^{(3)} \in \mathbb{N}^{34}$, as follows:

$\mathbf{h}^{(0)} \equiv (h_0, \dots, h_8, 0, \cdots, 0)$,
$\mathbf{h}^{(1)} \equiv (0, \dots, 0, h_9, \cdots, h_{17}, 0, \cdots, 0)$,
$\mathbf{h}^{(2)} \equiv (0, \dots, 0, h_{18}, \cdots, h_{26}, 0, \cdots, 0)$, and
$\mathbf{h}^{(3)} \equiv (0, \dots, 0, h_{27}, \cdots, h_{33})$.

Definition 12 (Set of m-Member p-Head Full-Flush (All-One-Suit) Winning Hands)

For $m \in \lbrace 0, 1, 2, 3, 4 \rbrace$ and $p \in \lbrace 0, 1 \rbrace$, we define the set of m-member p-head full-flush (all-one-suit) winning hands for the color c $(c = 0, 1, 2, 3)$, denoted by $\mathbf{W}_{m, p}^{(c)}$, as follows:

\mathbf{W}_{m, p}^{(c)} \equiv \lbrace \mathbf{w}^{(c)} \mid \mathbf{w} \in \mathbf{W}_{m, p} \rbrace.

Definition 13 (Partial Replacement Number)

For a given hand $\mathbf{h} \in \mathbb{N}^{34}$, $m \in \lbrace 0, 1, 2, 3, 4 \rbrace$, $p \in \lbrace 0, 1 \rbrace$, and a color $c \in \lbrace 0, 1, 2, 3 \rbrace$, we define the m-member p-head partial replacement number of $\mathbf{h}$ w.r.t. $c$, denoted by $r_{m, p}^{(c)}(\mathbf{h})$, as follows:

r_{m, p}^{(c)}(\mathbf{h}) \equiv \min_{\mathbf{w} \in \mathbf{W}_{m, p}^{(c)}} \delta(\mathbf{w}, \mathbf{h}) .

Theorem 1 (Partial Replacement Number)

For any given hand $\mathbf{h} \in \mathbb{N}^{34}$ and $m \in \lbrace 0, 1, 2, 3, 4 \rbrace$, the following holds:

\begin{align}
r_m(\mathbf{h}) = \min \left( r_{m_0, p_0}^{(0)}(\mathbf{h}^{(0)}) + r_{m_1, p_1}^{(1)}(\mathbf{h}^{(1)}) + r_{m_2, p_2}^{(2)}(\mathbf{h}^{(2)}) + r_{m_3, p_3}^{(3)}(\mathbf{h}^{(3)}) \right) \\
(m_0 + m_1 + m_2 + m_3 = m,\ p_0 + p_1 + p_2 + p_3 = 1) .
\end{align}

Proof of Theorem 1

Let us set $\mathbf{h} = (h_0, \dots, h_{33})$. From the rules of Mahjong, if $\mathbf{w} = (w_0, \dots, w_{33}) \in \mathbf{W}_ {m, p}$, there exist $m_0, m_1, m_2, m_3, p_0, p_1, p_2, p_3 \in \mathbb{N}$ such that

$m_0 + m_1 + m_2 + m_3 = m$,
$p_0 + p_1 + p_2 + p_3 = 1$,
$\mathbf{w}^{(0)} \in \mathbf{W}_ {m_0, p_0}^{(0)}$,
$\mathbf{w}^{(1)} \in \mathbf{W}_ {m_1, p_1}^{(1)}$,
$\mathbf{w}^{(2)} \in \mathbf{W}_ {m_2, p_2}^{(2)}$, and
$\mathbf{w}^{(3)} \in \mathbf{W}_ {m_3, p_3}^{(3)}$.

Therefore,

\begin{align*}
r_m(\mathbf{h}) &= \min_{\mathbf{w} \in \mathbf{W}_{m, 1}} \delta(\mathbf{w}, \mathbf{h}) \\
&= \min_{(w_0, \dots, w_{33}) \in \mathbf{W}_{m, 1}} \left( \sum_{i = 0}^{33} \max(w_i - h_i, 0) \right) \\
&= \min_{(w_0, \dots, w_8, 0, \dots, 0) \in \mathbf{W}_{m_0, p_0}^{(0)}} \left( \sum_{i = 0}^8 \max(w_i - h_i, 0) \right) + \min_{(0, \dots, 0, w_9, \dots, w_{17}, 0, \dots, 0) \in \mathbf{W}_{m_1, p_1}^{(1)}} \left( \sum_{i = 9}^{17} \max(w_i - h_i, 0) \right) + \min_{(0, \dots, 0, w_{18}, \dots, w_{26}, 0, \dots, 0) \in \mathbf{W}_{m_2, p_2}^{(2)}} \left( \sum_{i = 18}^{26} \max(w_i - h_i, 0) \right) + \min_{(0, \dots, 0, w_{27}, \dots, w_{33}) \in \mathbf{W}_{m_3, p_3}^{(3)}} \left( \sum_{i = 27}^{33} \max(w_i - h_i, 0) \right) \\
&= \min_{\mathbf{w} \in \mathbf{W}_{m_0, p_0}^{(0)}} \delta(\mathbf{w}, \mathbf{h}^{(0)}) + \min_{\mathbf{w} \in \mathbf{W}_{m_1, p_1}^{(1)}} \delta(\mathbf{w}, \mathbf{h}^{(1)}) + \min_{\mathbf{w} \in \mathbf{W}_{m_2, p_2}^{(2)}} \delta(\mathbf{w}, \mathbf{h}^{(2)}) + \min_{\mathbf{w} \in \mathbf{W}_{m_3, p_3}^{(3)}} \delta(\mathbf{w}, \mathbf{h}^{(3)}) \\
&= r_{m_0, p_0}^{(0)}(\mathbf{h}^{(0)}) + r_{m_1, p_1}^{(1)}(\mathbf{h}^{(1)}) + r_{m_2, p_2}^{(2)}(\mathbf{h}^{(2)}) + r_{m_3, p_3}^{(3)}(\mathbf{h}^{(3)}) .
\end{align*}

Moreover, for all $m_0', m_1', m_2', m_3', p_0', p_1', p_2', p_3'$ such that $m_0' + m_1' + m_2' + m_3' = m$ and $p_0' + p_1' + p_2' + p_3' = 1$, we have

\begin{align*}
r_{m, 1}(\mathbf{h}) &= \min_{\mathbf{w} \in \mathbf{W}_{m, 1}} \delta(\mathbf{w}, \mathbf{h}) \\
&= \min_{\mathbf{w} \in \mathbf{W}_{m, 1}} \left( \delta(\mathbf{w}^{(0)}, \mathbf{h}^{(0)}) + \delta(\mathbf{w}^{(1)}, \mathbf{h}^{(1)}) + \delta(\mathbf{w}^{(2)}, \mathbf{h}^{(2)}) + \delta(\mathbf{w}^{(3)}, \mathbf{h}^{(3)}) \right) \\
&\geq \min_{\mathbf{w} \in \mathbf{W}_{m, 1}} \delta(\mathbf{w}^{(0)}, \mathbf{h}^{(0)}) + \min_{\mathbf{w} \in \mathbf{W}_{m, 1}} \delta(\mathbf{w}^{(1)}, \mathbf{h}^{(1)}) + \min_{\mathbf{w} \in \mathbf{W}_{m, 1}} \delta(\mathbf{w}^{(2)}, \mathbf{h}^{(2)}) + \min_{\mathbf{w} \in \mathbf{W}_{m, 1}} \delta(\mathbf{w}^{(3)}, \mathbf{h}^{(3)}) \\
&= \min_{\mathbf{w} \in \mathbf{W}_{m, 1}^{(0)}} \delta(\mathbf{w}^{(0)}, \mathbf{h}^{(0)}) + \min_{\mathbf{w} \in \mathbf{W}_{m, 1}^{(1)}} \delta(\mathbf{w}^{(1)}, \mathbf{h}^{(1)}) + \min_{\mathbf{w} \in \mathbf{W}_{m, 1}^{(2)}} \delta(\mathbf{w}^{(2)}, \mathbf{h}^{(2)}) + \min_{\mathbf{w} \in \mathbf{W}_{m, 1}^{(3)}} \delta(\mathbf{w}^{(3)}, \mathbf{h}^{(3)}) \\
&\geq \min_{\mathbf{w} \in \mathbf{W}_{m_0', p_0'}^{(0)}} \delta(\mathbf{w}^{(0)}, \mathbf{h}^{(0)}) + \min_{\mathbf{w} \in \mathbf{W}_{m_1', p_1'}^{(1)}} \delta(\mathbf{w}^{(1)}, \mathbf{h}^{(1)}) + \min_{\mathbf{w} \in \mathbf{W}_{m_2', p_2'}^{(2)}} \delta(\mathbf{w}^{(2)}, \mathbf{h}^{(2)}) + \min_{\mathbf{w} \in \mathbf{W}_{m_3', p_3'}^{(3)}} \delta(\mathbf{w}^{(3)}, \mathbf{h}^{(3)}) \\
&= r_{m'_0, p'_0}^{(0)}(\mathbf{h}^{(0)}) + r_{m'_1, p'_1}^{(1)}(\mathbf{h}^{(1)}) + r_{m'_2, p'_2}^{(2)}(\mathbf{h}^{(2)}) + r_{m'_3, p'_3}^{(3)}(\mathbf{h}^{(3)}) .
\end{align*}

Hence, the claim is proved. $\square$

Definition 14 (Number-Tile Word and Number-Tile Language)

Let $\Omega_{\mathrm{S}} \equiv \lbrace 0, 1, 2, 3, 4 \rbrace$ be an alphabet. A number-tile word is defined as a 9-letter word (formed from $\Omega_{\mathrm{S}}$) whose total sum does not exceed 14. We then define the number-tile language $L_{\mathrm{S}}$ as the set of all such number-tile words.

Definition 15 (Enumerating Function)

Let $\Omega$ be an alphabet, and let $L$ be a finite set of words over $\Omega$. For any word $u \in \Omega^\ast$ such that there exists some word $v \in \Omega^\ast$ with $uv \in L$, define the enumerating function with respect to $L$ by

E_L(u) \equiv \vert\lbrace uv \mid uv \in L \rbrace\vert .

Theorem 2 (Enumerative Coding)

Let $\Omega$ be an alphabet equipped with a total order, and let $L$ be a finite set of words over $\Omega$. In this setting, for each $w = c_0 c_1 \cdots c_{l - 1} \in L$ $(c_i \in \Omega)$, if we define the enumerative coding for $L$, denoted by $H: L \to \mathbb{N}$, as

H(w) \equiv \sum_{c' < c_0} E_L(c') + \sum_{c' < c_1} E_L(c_0 c') + \cdots + \sum_{c' < c_{l - 1}} E_L(c_0 c_1 \cdots c_{l - 2} c'),

then $H$ serves as the minimal perfect hash function for $L$.

Proof of Theorem 2

See, e.g., T. Cover, "Enumerative source encoding," in IEEE Transactions on Information Theory, vol. 19, no. 1, pp. 73-77, January 1973, doi: 10.1109/TIT.1973.1054929.　Note that this paper provides a proof only for the case where the alphabet is {0, 1} (binary sequences), but this result can be easily extended to the case of general alphabets. $\square$

Definition 16 (Prefix State of a Number-Tile Word)

For any $u \in {\Omega_{\mathrm{S}}}^\ast$ such that there exists some $v \in {\Omega_{\mathrm{S}}}^\ast$ with $uv \in L_{\mathrm{S}}$, we define its prefix state $\sigma$ recursively as follows:

$\sigma(\varepsilon) \equiv (0, 0)$
Suppose $\sigma(u) \equiv (i, n)$. For any $c \in \Omega$ such that there exists some $v \in {\Omega_{\mathrm{S}}}^\ast$ with $ucv \in L_{\mathrm{S}}$, define $\sigma(uc) = (i + 1, n + c)$.

Theorem 3 (Prefix State of Number-Tile Words)

For any $u, u' \in {\Omega_{\mathrm{S}}}^\ast$ such that there exist $v, v' \in {\Omega_{\mathrm{S}}}^\ast$ with $uv, u'v' \in L_{\mathrm{S}}$, if $\sigma(u) = \sigma(u')$, then $E_{L_{\mathrm{S}}}(u) = E_{L_{\mathrm{S}}}(u')$.

Proof of Theorem 3

Since $L_{\mathrm{S}}$ is finite, it suffices to verify the proposition for all possible $u, u' \in {\Omega_{\mathrm{S}}}^\ast$ such that there exists $v, v' \in {\Omega_{\mathrm{S}}}^\ast$ with $uv, u'v' \in L_{\mathrm{S}}$. $\square$

Definition 17 (Enumerating Function for the Number-Tile Language in Terms of Prefix States)

Define the enumerating function for $L_{\mathrm{S}}$ in terms of prefix states by

T'(i, n) \equiv E_{L_{\mathrm{S}}}(u) \quad \text{such that} \quad \sigma(u) = (i, n) .

From Theorem 3, this definition of $T'$ is well-defined.

Theorem 4 (Enumerative Coding for Number-Tile Language)

For any number-tile word $w = c_0 \cdots c_{l - 1} \in L_{\mathrm{S}}$ $(c_i \in \Omega)$, the following property for $H_{L_{\mathrm{S}}}$ holds:

\begin{align*}
H_{L_{\mathrm{S}}}(w) = T(1, 0, c_0) + T(2, c_0, c_1) + T(3, c_0 + c_1, c_2) + \cdots + T(9, c_0 + \cdots + c_7, c_8),
\end{align*}

where

T(i, n, c) \equiv \sum_{c' < c} T'(i, n + c') .

Proof of Theorem 4

\begin{align*}
H_{L_{\mathrm{S}}}(w)
&= \sum_{c' < c_0} E_{L_{\mathrm{S}}}(c') + \sum_{c' < c_1} E_{L_{\mathrm{S}}}(c_0 c') + \cdots + \sum_{c' < c_8} E_{L_{\mathrm{S}}}(c_0 c_1 \cdots c_7 c') \\
&= \sum_{c' < c_0} T'(1, c') + \sum_{c' < c_1} T'(2, c_0 + c') + \cdots + \sum_{c' < c_8} T'(9, c_0 + c_1 + \cdots + c_7 + c') \\
&= T(1, 0, c_0) + T(2, c_0, c_1) + \cdots T(9, c_0 + \cdots + c_7, c_8) .
\end{align*}

Hence, the claim is proved. $\square$

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up