Theory Huffman

Up to index of Isabelle/HOL/Huffman

theory Huffman
imports Main
(*  Title:       An Isabelle/HOL Formalization of the Textbook Proof of Huffman's Algorithm
Author: Jasmin Christian Blanchette <blanchette at in.tum.de>, 2008
Maintainer: Jasmin Christian Blanchette <blanchette at in.tum.de>
*)


(*<*)
theory Huffman
imports Main
begin
(*>*)

section {* Introduction *}

subsection {* Binary Codes
\label{binary-codes} *}


text {*
Suppose we want to encode strings over a finite source alphabet to sequences
of bits. The approach used by ASCII and most other charsets is to map each
source symbol to a distinct $k$-bit code word, where $k$ is fixed and is
typically 8 or 16. To encode a string of symbols, we simply encode each symbol
in turn. Decoding involves mapping each $k$-bit block back to the symbol it
represents.

Fixed-length codes are simple and fast, but they generally waste space. If
we know the frequency $w_a$ of each source symbol $a$, we can save space
by using shorter code words for the most frequent symbols. We
say that a (variable-length) code is {\sl optimum\/} if it minimizes the sum
$\sum_a w_a \vthinspace\delta_a$, where $\delta_a$ is the length of the binary
code word for $a$. Information theory tells us that a code is optimum if
for each source symbol $c$ the code word representing $c$ has length
$$\textstyle \delta_c = \log_2 {1 \over p_c}, \qquad
\hbox{where}\enskip p_c = {w_c \over \sum_a w_a}.$$
This number is generally not an integer, so we cannot use it directly.
Nonetheless, the above criterion is a useful yardstick and paves the way for
arithmetic coding \cite{rissanen-1976}, a generalization of the method
presented here.

\def\xabacabad{\xa\xb\xa\xc\xa\xb\xa\xd}%
As an example, consider the source string `$\xabacabad$'. We have
$$p_{\xa} = \tfrac{1}{2},\,\; p_{\xb} = \tfrac{1}{4},\,\;
p_{\xc} = \tfrac{1}{8},\,\; p_{\xd} = \tfrac{1}{8}.$$
The optimum lengths for the binary code words are all integers, namely
$$\delta_{\xa} = 1,\,\; \delta_{\xb} = 2,\,\; \delta_{\xc} = 3,\,\;
\delta_{\xd} = 3,$$
and they are realized by the code
$$C_1 = \{ \xa \mapsto 0,\, \xb \mapsto 10,\, \xc \mapsto 110,\,
\xd \mapsto 111 \}.$$
Encoding `$\xabacabad$' produces the 14-bit code word 01001100100111. The code
$C_1$ is optimum: No code that unambiguously encodes source symbols one at a
time could do better than $C_1$ on the input `$\xa\xb\xa\xc\xa\xb\xa\xd$'. In
particular, with a fixed-length code such as
$$C_2 = \{ \xa \mapsto 00,\, \xb \mapsto 01,\, \xc \mapsto 10,\,
\xd \mapsto 11 \}$$
we need at least 16~bits to encode `$\xabacabad$'.
*}


subsection {* Binary Trees *}

text {*
Inside a program, binary codes can be represented by binary trees. For example,
the trees\strut
$$\vcenter{\hbox{\includegraphics[scale=1.25]{tree-abcd-unbalanced.pdf}}}
\qquad \hbox{and} \qquad
\vcenter{\hbox{\includegraphics[scale=1.25]{tree-abcd-balanced.pdf}}}$$
correspond to $C_1$ and $C_2$. The code word for a given
symbol can be obtained as follows: Start at the root and descend toward the leaf
node associated with the symbol one node at a time; generate a 0 whenever the
left child of the current node is chosen and a 1 whenever the right child is
chosen. The generated sequence of 0s and 1s is the code word.

To avoid ambiguities, we require that only leaf nodes are labeled with symbols.
This ensures that no code word is a prefix of another, thereby eliminating the
source of all ambiguities.%
\footnote{Strictly speaking, there is another potential source of ambiguity.
If the alphabet consists of a single symbol $a$, that symbol could be mapped
to the empty code word, and then any string $aa\ldots a$ would map to the
empty bit sequence, giving the decoder no way to recover the original string's
length. This scenario can be ruled out by requiring that the alphabet has
cardinality 2 or more.}
Codes that have this property are called {\sl prefix codes}. As an example of a
code that doesn't have this property, consider the code associated with the
tree\strut
$$\vcenter{\hbox{\includegraphics[scale=1.25]{tree-abcd-non-prefix.pdf}}}$$
and observe that `$\xb\xb\xb$', `$\xb\xd$', and `$\xd\xb$' all map to the
code word 111.

Each node in a code tree is assigned a {\sl weight}. For a leaf node, the
weight is the frequency of its symbol; for an inner node, it is the sum of the
weights of its subtrees. Code trees can be annotated with their weights:\strut
$$\vcenter{\hbox{\includegraphics[scale=1.25]%
{tree-abcd-unbalanced-weighted.pdf}}}
\qquad\qquad
\vcenter{\hbox{\includegraphics[scale=1.25]%
{tree-abcd-balanced-weighted.pdf}}}$$
For our purposes, it is sufficient to consider only full binary trees (trees
whose inner nodes all have two children). This is because any inner node with
only one child can advantageously be eliminated; for example,\strut
$$\vcenter{\hbox{\includegraphics[scale=1.25]{tree-abc-non-full.pdf}}}
\qquad \hbox{becomes} \qquad
\vcenter{\hbox{\includegraphics[scale=1.25]{tree-abc-full.pdf}}}$$
*}


subsection {* Huffman's Algorithm *}

text {*
David Huffman \cite{huffman-1952} discovered a simple algorithm for
constructing an optimum code tree for specified symbol frequencies:
Create a forest consisting of only leaf nodes, one for each symbol in the
alphabet, taking the given symbol frequencies as initial weights for the nodes.
Then pick the two trees
$$\vcenter{\hbox{\includegraphics[scale=1.25]{tree-w1.pdf}}}
\qquad \hbox{and} \qquad
\vcenter{\hbox{\includegraphics[scale=1.25]{tree-w2.pdf}}}$$

\noindent\strut
with the lowest weights and replace them with the tree
$$\vcenter{\hbox{\includegraphics[scale=1.25]{tree-w1-w2.pdf}}}$$
Repeat this process until only one tree is left.

As an illustration, executing the algorithm for the frequencies
$$f_{\xd} = 3,\,\; f_{\xe} = 11,\,\; f_{\xf} = 5,\,\; f_{\xs} = 7,\,\;
f_{\xz} = 2$$
gives rise to the following sequence of states:\strut

\def\myscale{1}%
\setbox\myboxi=\hbox{(9)\strut}%
\setbox\myboxii=\hbox{\includegraphics[scale=\myscale]{tree-prime-step1.pdf}}%
\setbox\myboxiii=\hbox{\includegraphics[scale=\myscale]{tree-prime-step2.pdf}}%
$$(1)\quad\lower\ht\myboxii\hbox{\raise\ht\myboxi\box\myboxii} \qquad\qquad
(2)\enskip\lower\ht\myboxiii\hbox{\raise\ht\myboxi\box\myboxiii}$$

\vskip.5\smallskipamount

\noindent
\setbox\myboxii=\hbox{\includegraphics[scale=\myscale]{tree-prime-step3.pdf}}%
\setbox\myboxiii=\hbox{\includegraphics[scale=\myscale]{tree-prime-step4.pdf}}%
\setbox\myboxiv=\hbox{\includegraphics[scale=\myscale]{tree-prime-step5.pdf}}%
(3)\quad\lower\ht\myboxii\hbox{\raise\ht\myboxi\box\myboxii}\hfill\quad
(4)\quad\lower\ht\myboxiii\hbox{\raise\ht\myboxi\box\myboxiii}\hfill
(5)\enskip\lower\ht\myboxiv\hbox{\raise\ht\myboxi\box\myboxiv\,}

\smallskip
\noindent
Tree~(5) is an optimum tree for the given frequencies.
*}


subsection {* The Textbook Proof *}

text {*
Why does the algorithm work? In his article, Huffman gave some motivation but
no real proof. For a proof sketch, we turn to Donald Knuth
\cite[p.~403--404]{knuth-1997}:

\begin{quote}
It is not hard to prove that this method does in fact minimize the weighted
path length [i.e., $\sum_a w_a \vthinspace\delta_a$], by induction on $m$.
Suppose we have $w_1 \le w_2 \le w_3 \le \cdots \le w_m$, where $m \ge 2$, and
suppose that we are given a tree that minimizes the weighted path length.
(Such a tree certainly exists, since only finitely many binary trees with $m$
terminal nodes are possible.) Let $V$ be an internal node of maximum distance
from the root. If $w_1$ and $w_2$ are not the weights already attached to the
children of $V$, we can interchange them with the values that are already
there; such an interchange does not increase the weighted path length. Thus
there is a tree that minimizes the weighted path length and contains the
subtree\strut
$$\vcenter{\hbox{\includegraphics[scale=1.25]{tree-w1-w2-leaves.pdf}}}$$
Now it is easy to prove that the weighted path length of such a tree is
minimized if and only if the tree with
$$\vcenter{\hbox{\includegraphics[scale=1.25]{tree-w1-w2-leaves.pdf}}}
\qquad \hbox{replaced by} \qquad
\vcenter{\hbox{\includegraphics[scale=1.25]{tree-w1-plus-w2.pdf}}}$$
has minimum path length for the weights $w_1 + w_2$, $w_3$, $\ldots\,$, $w_m$.
\end{quote}

\noindent
There is, however, a small oddity in this proof: It is not clear why we must
assert the existence of an optimum tree that contains the subtree
$$\vcenter{\hbox{\includegraphics[scale=1.25]{tree-w1-w2-leaves.pdf}}}$$
Indeed, the formalization works without it.

Cormen et al.\ \cite[p.~385--391]{cormen-et-al-2001} provide a very similar
proof, articulated around the following propositions:

\begin{quote}
\textsl{\textbf{Lemma 16.2}} \\
Let $C$ be an alphabet in which each character $c \in C$ has frequency $f[c]$.
Let $x$ and $y$ be two characters in $C$ having the lowest frequencies. Then
there exists an optimal prefix code for $C$ in which the codewords for $x$ and
$y$ have the same length and differ only in the last bit.

\medskip

\textsl{\textbf{Lemma 16.3}} \\
Let $C$ be a given alphabet with frequency $f[c]$ defined for each character
$c \in C$. Let $x$ and $y$ be two characters in $C$ with minimum frequency. Let
$C'$ be the alphabet $C$ with characters $x$, $y$ removed and (new) character
$z$ added, so that $C' = C - \{x, y\} \cup {\{z\}}$; define $f$ for $C'$ as for
$C$, except that $f[z] = f[x] + f[y]$. Let $T'$ be any tree representing an
optimal prefix code for the alphabet $C'$. Then the tree $T$, obtained from
$T'$ by replacing the leaf node for $z$ with an internal node having $x$ and
$y$ as children, represents an optimal prefix code for the alphabet $C$.

\medskip

\textsl{\textbf{Theorem 16.4}} \\
Procedure \textsc{Huffman} produces an optimal prefix code.
\end{quote}
*}


subsection {* Overview of the Formalization *}

text {*
This report presents a formalization of the proof of Huffman's algorithm
written using Isabelle/HOL \cite{nipkow-et-al-2008}. Our proof is based on the
informal proofs given by Knuth and Cormen et al. The development was done
independently of Laurent Th\'ery's Coq proof \cite{thery-2003,thery-2004},
which through its ``cover'' concept represents a considerable departure from
the textbook proof.

The development consists of 90 lemmas and 5 theorems. Most of them have very
short proofs thanks to the extensive use of simplification rules and custom
induction rules. The remaining proofs are written using the structured proof
format Isar \cite{wenzel-2008} and are accompanied by informal arguments and
diagrams.

The report is organized as follows. Section~\ref{trees-and-forests} defines
the datatypes for binary code trees and forests and develops a small library of
related functions. (Incidentally, there is nothing special about binary codes
and binary trees. Huffman's algorithm and its proof can be generalized to
$n$-ary trees \cite[p.~405 and 595]{knuth-1997}.) Section~\ref{implementation}
presents a functional implementation of the algorithm. Section~\ref{auxiliary}
defines several tree manipulation functions needed for the proof.
Section~\ref{formalization} presents three key lemmas and concludes with the
optimality theorem. Section~\ref{related-work} compares our work with Th\'ery's
Coq proof. Finally, Section~\ref{conclusion} concludes the report.
*}


subsection {* Overview of Isabelle's HOL Logic *}

text {*
This section presents a brief overview of the Isabelle/HOL logic, so that
readers not familiar with the system can at least understand the lemmas and
theorems, if not the proofs. Readers who already know Isabelle are encouraged
to skip this section.

Isabelle is a generic theorem prover whose built-in metalogic is an
intuitionistic fragment of higher-order logic
\cite{gordon-melham-1994,nipkow-et-al-2008}. The metalogical operators are
material implication, written
\vthinspace@{text "[|φ1; …; φn|] ==> ψ"}\vthinspace{} (``if @{term φ1} and
$\ldots$ and @{term φn}, then @{term ψ}''), universal quantification,
written \vthinspace@{text "!!x1 … xn. ψ"}\vthinspace{} (``for all $@{term x1},
\ldots, @{term xn}$ we have @{term ψ}''), and equality, written
\vthinspace@{text "t ≡ u"}.

The incarnation of Isabelle that we use in this development, Isabelle/HOL,
provides a more elaborate version of higher-order logic, complete with the
familiar connectives and quantifiers (@{text "¬"}, @{text "∧"}, @{text "∨"},
@{text "-->"}, @{text "∀"}, and @{text "∃"}) on terms of type @{typ bool}. In
addition, $=$ expresses equivalence. The formulas
\vthinspace@{text "!!x1 … xm. [|φ1; …; φn|] ==> ψ"}\vthinspace{} and
\vthinspace@{text "∀x1. … ∀xm. φ1 ∧"}$\,\cdots\,$@{text "∧ φn --> ψ"}%
\vthinspace{} are logically equivalent, but they interact differently with
Isabelle's proof tactics.

The term language consists of simply typed $\lambda$-terms written in an
ML-like syntax \cite{milner-et-al-1997}. Function application expects no
parentheses around the argument list and no commas between the arguments, as in
@{term "f x y"}. Syntactic sugar provides an infix syntax for common operators,
such as $x = y$ and $x + y$. Types are inferred automatically in most cases, but
they can always be supplied using an annotation @{text "t::τ"}, where @{term t}
is a term and @{text τ} is its type. The type of total functions from
@{typ 'a} to @{typ 'b} is written @{typ "'a => 'b"}. Variables may range
over functions.

The type of natural numbers is called @{typ nat}. The type of lists over type
@{text 'a}, written @{typ "'a list"}, features the empty list @{term "[]"},
the infix constructor @{term "x # xs"} (where @{term x} is an element of type
@{text 'a} and @{term xs} is a list over @{text 'a}), and the conversion
function @{term set} from lists to sets. The type of sets over @{text 'a} is
written @{typ "'a set"}. Operations on sets are written using traditional
mathematical notation.
*}


subsection {* Head of the Theory File *}

text {*
The Isabelle theory starts in the standard way.

\myskip

\noindent
\isacommand{theory} @{text "Huffman"} \\
\isacommand{imports} @{text "Main"} \\
\isacommand{begin}

\myskip

\noindent
We attach the @{text "simp"} attribute to some predefined lemmas to add them to
the default set of simplification rules.
*}


declare Int_Un_distrib [simp]
Int_Un_distrib2 [simp]
min_max.sup_absorb1 [simp]
min_max.sup_absorb2 [simp]

section {* Definition of Prefix Code Trees and Forests
\label{trees-and-forests} *}


subsection {* Tree Datatype *}

text {*
A {\sl prefix code tree\/} is a full binary tree in which leaf nodes are of the
form @{term "Leaf w a"}, where @{term a} is a symbol and @{term w} is the
frequency associated with @{term a}, and inner nodes are of the form
@{term "InnerNode w t1 t2"}, where @{term t1} and @{term t2} are the left and
right subtrees and @{term w} caches the sum of the weights of @{term t1} and
@{term t2}. Prefix code trees are polymorphic on the symbol datatype~@{typ 'a}.
*}


datatype 'a tree =
Leaf nat 'a |
InnerNode nat "('a tree)" "('a tree)"

subsection {* Forest Datatype *}

text {*
The intermediate steps of Huffman's algorithm involve a list of prefix code
trees, or {\sl prefix code forest}.
*}


type_synonym 'a forest = "'a tree list"

subsection {* Alphabet *}

text {*
The {\sl alphabet\/} of a code tree is the set of symbols appearing in the
tree's leaf nodes.
*}


primrec alphabet :: "'a tree => 'a set" where
"alphabet (Leaf w a) = {a}" |
"alphabet (InnerNode w t1 t2) = alphabet t1 ∪ alphabet t2"

text {*
For sets and predicates, Isabelle gives us the choice between inductive
definitions (\isakeyword{inductive\_set} and \isakeyword{inductive}) and
recursive functions (\isakeyword{primrec}, \isakeyword{fun}, and
\isakeyword{function}). In this development, we consistently favor recursion
over induction, for two reasons:

\begin{myitemize}
\item Recursion gives rise to simplification rules that greatly help automatic
proof tactics. In contrast, reasoning about inductively defined sets and
predicates involves introduction and elimination rules, which are more clumsy
than simplification rules.

\item Isabelle's counterexample generator \isakeyword{quickcheck}
\cite{berghofer-nipkow-2004}, which we used extensively during the top-down
development of the proof (together with \isakeyword{sorry}), has better support
for recursive definitions.
\end{myitemize}

The alphabet of a forest is defined as the union of the alphabets of the trees
that compose it. Although Isabelle supports overloading for non-overlapping
types, we avoid many type inference problems by attaching an
`\raise.3ex\hbox{@{text "F"}}' subscript to the forest generalizations of
functions defined on trees.
*}


primrec alphabetF :: "'a forest => 'a set" where
"alphabetF [] = {}" |
"alphabetF (t # ts) = alphabet t ∪ alphabetF ts"

text {*
Alphabets are central to our proofs, and we need the following basic facts
about them.
*}


lemma finite_alphabet [simp]:
"finite (alphabet t)"
by (induct t) auto

lemma exists_in_alphabet:
"∃a. a ∈ alphabet t"
by (induct t) auto

subsection {* Consistency *}

text {*
A tree is {\sl consistent\/} if for each inner node the alphabets of the two
subtrees are disjoint. Intuitively, this means that every symbol in the
alphabet occurs in exactly one leaf node. Consistency is a sufficient condition
for $\delta_a$ (the length of the {\sl unique\/} code word for $a$) to be
defined. Although this well\-formed\-ness property isn't mentioned in algorithms
textbooks \cite{aho-et-al-1983,cormen-et-al-2001,knuth-1997}, it is essential
and appears as an assumption in many of our lemmas.
*}


primrec consistent :: "'a tree => bool" where
"consistent (Leaf w a) = True" |
"consistent (InnerNode w t1 t2) =
(consistent t1 ∧ consistent t2 ∧ alphabet t1 ∩ alphabet t2 = {})"


primrec consistentF :: "'a forest => bool" where
"consistentF [] = True" |
"consistentF (t # ts) =
(consistent t ∧ consistentF ts ∧ alphabet t ∩ alphabetF ts = {})"


text {*
Several of our proofs are by structural induction on consistent trees $t$ and
involve one symbol $a$. These proofs typically distinguish the following cases.

\begin{myitemize}
\item[] {\sc Base case:}\enspace $t = @{term "Leaf w b"}$.
\item[] {\sc Induction step:}\enspace $t = @{term "InnerNode w t1 t2"}$.
\item[] \noindent\kern\leftmargin {\sc Subcase 1:}\enspace $a$ belongs to
@{term t1} but not to @{term t2}.
\item[] \noindent\kern\leftmargin {\sc Subcase 2:}\enspace $a$ belongs to
@{term t2} but not to @{term t1}.
\item[] \noindent\kern\leftmargin {\sc Subcase 3:}\enspace $a$ belongs to
neither @{term t1} nor @{term t2}.
\end{myitemize}

\noindent
Thanks to the consistency assumption, we can rule out the subcase where $a$
belongs to both subtrees.

Instead of performing the above case distinction manually, we encode it in a
custom induction rule. This saves us from writing repetitive proof scripts and
helps Isabelle's automatic proof tactics.
*}


lemma tree_induct_consistent [consumes 1, case_names base step1 step2 step3]:
"[|consistent t;
!!wb b a. P (Leaf wb b) a;
!!w t1 t2 a.
[|consistent t1; consistent t2; alphabet t1 ∩ alphabet t2 = {};
a ∈ alphabet t1; a ∉ alphabet t2; P t1 a; P t2 a|] ==>
P (InnerNode w t1 t2) a;
!!w t1 t2 a.
[|consistent t1; consistent t2; alphabet t1 ∩ alphabet t2 = {};
a ∉ alphabet t1; a ∈ alphabet t2; P t1 a; P t2 a|] ==>
P (InnerNode w t1 t2) a;
!!w t1 t2 a.
[|consistent t1; consistent t2; alphabet t1 ∩ alphabet t2 = {};
a ∉ alphabet t1; a ∉ alphabet t2; P t1 a; P t2 a|] ==>
P (InnerNode w t1 t2) a|] ==>
P t a"


txt {*
The proof relies on the \textit{induction\_schema} and
\textit{lexicographic\_order} tactics, which automate the most tedious
aspects of deriving induction rules. The alternative would have been to perform
a standard structural induction on @{term t} and proceed by cases, which is
straightforward but long-winded.
*}


apply rotate_tac
apply induction_schema
apply atomize_elim
apply (case_tac t)
apply fastforce
apply fastforce
by lexicographic_order

text {*
The \textit{induction\_schema} tactic reduces the putative induction rule to
simpler proof obligations.
Internally, it reuses the machinery that constructs the default induction
rules. The resulting proof obligations concern (a)~case completeness,
(b)~invariant preservation (in our case, tree consistency), and
(c)~wellfoundedness. For @{thm [source] tree_induct_consistent}, the obligations
(a)~and (b) can be discharged using
Isabelle's simplifier and classical reasoner, whereas (c) requires a single
invocation of \textit{lexicographic\_order}, a tactic that was originally
designed to prove termination of recursive functions
\cite{bulwahn-et-al-2007,krauss-2007,krauss-2009}.
*}


subsection {* Symbol Depths *}

text {*
The {\sl depth\/} of a symbol (which we denoted by $\delta_a$ in
Section~\ref{binary-codes}) is the length of the path from the root to the
leaf node labeled with that symbol, or equivalently the length of the code word
for the symbol. Symbols that don't occur in the tree or that occur at the root
of a one-node tree have depth 0. If a symbol occurs in several leaf nodes (which
may happen with inconsistent trees), the depth is arbitrarily defined in terms
of the leftmost node labeled with that symbol.
*}


primrec depth :: "'a tree => 'a => nat" where
"depth (Leaf w b) a = 0" |
"depth (InnerNode w t1 t2) a =
(if a ∈ alphabet t1 then depth t1 a + 1
else if a ∈ alphabet t2 then depth t2 a + 1
else 0)"


text {*
The definition may seem very inefficient from a functional programming
point of view, but it does not matter, because unlike Huffman's algorithm, the
@{const depth} function is merely a reasoning tool and is never actually
executed.
*}


subsection {* Height *}

text {*
The {\sl height\/} of a tree is the length of the longest path from the root to
a leaf node, or equivalently the length of the longest code word. This is
readily generalized to forests by taking the maximum of the trees' heights. Note
that a tree has height 0 if and only if it is a leaf node, and that a forest has
height 0 if and only if all its trees are leaf nodes.
*}


primrec height :: "'a tree => nat" where
"height (Leaf w a) = 0" |
"height (InnerNode w t1 t2) = max (height t1) (height t2) + 1"

primrec heightF :: "'a forest => nat" where
"heightF [] = 0" |
"heightF (t # ts) = max (height t) (heightF ts)"

text {*
The depth of any symbol in the tree is bounded by the tree's height, and there
exists a symbol with a depth equal to the height.
*}


lemma depth_le_height:
"depth t a ≤ height t"
by (induct t) auto

lemma exists_at_height:
"consistent t ==> ∃a ∈ alphabet t. depth t a = height t"
proof (induct t)
case Leaf thus ?case by simp
next
case (InnerNode w t1 t2)
note hyps = InnerNode
let ?t = "InnerNode w t1 t2"
from hyps obtain b where b: "b ∈ alphabet t1" "depth t1 b = height t1" by auto
from hyps obtain c where c: "c ∈ alphabet t2" "depth t2 c = height t2" by auto
let ?a = "if height t1 ≥ height t2 then b else c"
from b c have "?a ∈ alphabet ?t" "depth ?t ?a = height ?t"
using `consistent ?t` by auto
thus "∃a ∈ alphabet ?t. depth ?t a = height ?t" ..
qed

text {*
The following elimination rules help Isabelle's classical prover, notably the
\textit{auto} tactic. They are easy consequences of the inequation
@{thm depth_le_height [no_vars]}.
*}


lemma depth_max_heightE_left [elim!]:
"[|depth t1 a = max (height t1) (height t2);
[|depth t1 a = height t1; height t1 ≥ height t2|] ==> P|] ==>
P"

by (cut_tac t = t1 and a = a in depth_le_height) simp

lemma depth_max_heightE_right [elim!]:
"[|depth t2 a = max (height t1) (height t2);
[|depth t2 a = height t2; height t2 ≥ height t1|] ==> P|] ==>
P"

by (cut_tac t = t2 and a = a in depth_le_height) simp

text {*
We also need the following lemma.
*}


lemma height_gt_0_alphabet_eq_imp_height_gt_0:
assumes "height t > 0" "consistent t" "alphabet t = alphabet u"
shows "height u > 0"
proof (cases t)
case Leaf thus ?thesis using assms by simp
next
case (InnerNode w t1 t2)
note t = InnerNode
from exists_in_alphabet obtain b where b: "b ∈ alphabet t1" ..
from exists_in_alphabet obtain c where c: "c ∈ alphabet t2" ..
from b c have bc: "b ≠ c" using t `consistent t` by fastforce
show ?thesis
proof (cases u)
case Leaf thus ?thesis using b c bc t assms by auto
next
case InnerNode thus ?thesis by simp
qed
qed

subsection {* Symbol Frequencies *}

text {*
The {\sl frequency\/} of a symbol (which we denoted by $w_a$ in
Section~\ref{binary-codes}) is the sum of the weights attached to the
leaf nodes labeled with that symbol. If the tree is consistent, the sum
comprises at most one nonzero term. The frequency is then the weight of the leaf
node labeled with the symbol, or 0 if there is no such node. The generalization
to forests is straightforward.
*}


primrec freq :: "'a tree => 'a => nat" where
"freq (Leaf w a) = (λb. if b = a then w else 0)" |
"freq (InnerNode w t1 t2) = (λb. freq t1 b + freq t2 b)"

primrec freqF :: "'a forest => 'a => nat" where
"freqF [] = (λb. 0)" |
"freqF (t # ts) = (λb. freq t b + freqF ts b)"

text {*
Alphabet and symbol frequencies are intimately related. Simplification rules
ensure that sums of the form @{term "freq t1 a + freq t2 a"} collapse to a
single term when we know which tree @{term a} belongs to.
*}


lemma notin_alphabet_imp_freq_0 [simp]:
"a ∉ alphabet t ==> freq t a = 0"
by (induct t) simp+

lemma notin_alphabetF_imp_freqF_0 [simp]:
"a ∉ alphabetF ts ==> freqF ts a = 0"
by (induct ts) simp+

lemma freq_0_right [simp]:
"[|alphabet t1 ∩ alphabet t2 = {}; a ∈ alphabet t1|] ==> freq t2 a = 0"
by (auto intro: notin_alphabet_imp_freq_0 simp: disjoint_iff_not_equal)

lemma freq_0_left [simp]:
"[|alphabet t1 ∩ alphabet t2 = {}; a ∈ alphabet t2|] ==> freq t1 a = 0"
by (auto simp: disjoint_iff_not_equal)

text {*
Two trees are {\em comparable} if they have the same alphabet and symbol
frequencies. This is an important concept, because it allows us to state not
only that the tree constructed by Huffman's algorithm is optimal but also that
it has the expected alphabet and frequencies.

We close this section with a more technical lemma.
*}


lemma heightF_0_imp_Leaf_freqF_in_set:
"[|consistentF ts; heightF ts = 0; a ∈ alphabetF ts|] ==>
Leaf (freqF ts a) a ∈ set ts"

proof (induct ts)
case Nil thus ?case by simp
next
case (Cons t ts) show ?case using Cons
proof (cases t)
case Leaf thus ?thesis using Cons by clarsimp
next
case InnerNode thus ?thesis using Cons by clarsimp
qed
qed

subsection {* Weight *}

text {*
The @{term weight} function returns the weight of a tree. In the
@{const InnerNode} case, we ignore the weight cached in the node and instead
compute the tree's weight recursively. This makes reasoning simpler because we
can then avoid specifying cache correctness as an assumption in our lemmas.
*}


primrec weight :: "'a tree => nat" where
"weight (Leaf w a) = w" |
"weight (InnerNode w t1 t2) = weight t1 + weight t2"

text {*
The weight of a tree is the sum of the frequencies of its symbols.

\myskip

\noindent
\isacommand{lemma} @{text "weight_eq_Sum_freq"}: \\
{\isachardoublequoteopen}$\displaystyle @{text "consistent t ==> weight t"} =
\!\!\sum_{a\in @{term "alphabet t"}}^{\phantom{.}}\!\! @{term "freq t a"}$%
{\isachardoublequoteclose}

\vskip-\myskipamount
*}


(*<*)
lemma weight_eq_Sum_freq:
"consistent t ==> weight t = (∑a ∈ alphabet t. freq t a)"
(*>*)
by (induct t) (auto simp: setsum_Un_disjoint)

text {*
The assumption @{term "consistent t"} is not necessary, but it simplifies the
proof by letting us invoke the lemma @{thm [source] setsum_Un_disjoint}:
$$@{text "[|finite A; finite B; A ∩ B = {}|] ==>"}~\!\sum_{x\in A} @{term "g x"}
\vthinspace \mathrel{+} \sum_{x\in B} @{term "g x"}\vthinspace = %
\!\!\sum_{x\in A \cup B}\! @{term "g x"}.$$
*}


subsection {* Cost *}

text {*
The {\sl cost\/} of a consistent tree, sometimes called the {\sl weighted path
length}, is given by the sum $\sum_{a \in @{term "alphabet t"}\,}
@{term "freq t a"} \mathbin{@{text "*"}} @{term "depth t a"}$
(which we denoted by $\sum_a w_a \vthinspace\delta_a$ in
Section~\ref{binary-codes}). It obeys a simple recursive law.
*}


primrec cost :: "'a tree => nat" where
"cost (Leaf w a) = 0" |
"cost (InnerNode w t1 t2) = weight t1 + cost t1 + weight t2 + cost t2"

text {*
One interpretation of this recursive law is that the cost of a tree is the sum
of the weights of its inner nodes \cite[p.~405]{knuth-1997}. (Recall that
$@{term "weight (InnerNode w t1 t2)"} = @{term "weight t1 + weight t2"}$.) Since
the cost of a tree is such a fundamental concept, it seems necessary to prove
that the above function definition is correct.

\myskip

\noindent
\isacommand{theorem} @{text "cost_eq_Sum_freq_mult_depth"}: \\
{\isachardoublequoteopen}$\displaystyle @{text "consistent t ==> cost t"} =
\!\!\sum_{a\in @{term "alphabet t"}}^{\phantom{.}}\!\!
@{term "freq t a * depth t a"}$%
{\isachardoublequoteclose}

\myskip

\noindent
The proof is by structural induction on $t$. If $t = @{term "Leaf w b"}$, both
sides of the equation simplify to 0. This leaves the case $@{term t} =
@{term "InnerNode w t1 t2"}$. Let $A$, $A_1$, and $A_2$ stand for
@{term "alphabet t"}, @{term "alphabet t1"}, and @{term "alphabet t2"},
respectively. We have
%
$$\begin{tabularx}{\textwidth}{@%
{\hskip\leftmargin}cX@%
{}}
& @{term "cost t"} \\
\eq & \justif{definition of @{const cost}} \\
& $@{term "weight t1 + cost t1 + weight t2 + cost t2"}$ \\
\eq & \justif{induction hypothesis} \\
& $@{term "weight t1"} \mathrel{+}
\sum_{a\in A_1\,} @{term "freq t1 a * depth t1 a"} \mathrel{+} {}$ \\
& $@{term "weight t2"} \mathrel{+}
\sum_{a\in A_2\,} @{term "freq t2 a * depth t2 a"}$ \\
\eq & \justif{definition of @{const depth}, consistency} \\[\extrah]
& $@{term "weight t1"} \mathrel{+}
\sum_{a\in A_1\,} @{term "freq t1 a * (depth t a - 1)"} \mathrel{+}
{}$ \\
& $@{term "weight t2"} \mathrel{+}
\sum_{a\in A_2\,} @{term "freq t2 a * (depth t a - 1)"}$ \\[\extrah]
\eq & \justif{distributivity of @{text "*"} and $\sum$ over $-$} \\[\extrah]
& $@{term "weight t1"} \mathrel{+}
\sum_{a\in A_1\,} @{term "freq t1 a * depth t a"} \mathrel{-}
\sum_{a\in A_1\,} @{term "freq t1 a"} \mathrel{+} {}$ \\
& $@{term "weight t2"} \mathrel{+}
\sum_{a\in A_2\,} @{term "freq t2 a * depth t a"} \mathrel{-}
\sum_{a\in A_2\,} @{term "freq t2 a"}$ \\[\extrah]
\eq & \justif{@{thm [source] weight_eq_Sum_freq}} \\[\extrah]
& $\sum_{a\in A_1\,} @{term "freq t1 a * depth t a"} \mathrel{+}
\sum_{a\in A_2\,} @{term "freq t2 a * depth t a"}$ \\[\extrah]
\eq & \justif{definition of @{const freq}, consistency} \\[\extrah]
& $\sum_{a\in A_1\,} @{term "freq t a * depth t a"} \mathrel{+}
\sum_{a\in A_2\,} @{term "freq t a * depth t a"}$ \\[\extrah]
\eq & \justif{@{thm [source] setsum_Un_disjoint}, consistency} \\
& $\sum_{a\in A_1\cup A_2\,} @{term "freq t a * depth t a"}$ \\
\eq & \justif{definition of @{const alphabet}} \\
& $\sum_{a\in A\,} @{term "freq t a * depth t a"}$.
\end{tabularx}$$

\noindent
The structured proof closely follows this argument.
*}


(*<*)
theorem cost_eq_Sum_freq_mult_depth:
"consistent t ==> cost t = (∑a ∈ alphabet t. freq t a * depth t a)"
(*>*)
proof (induct t)
case Leaf thus ?case by simp
next
case (InnerNode w t1 t2)
let ?t = "InnerNode w t1 t2"
let ?A = "alphabet ?t" and ?A1 = "alphabet t1" and ?A2 = "alphabet t2"
note c = `consistent ?t`
note hyps = InnerNode
have d2: "!!a. [|?A1 ∩ ?A2 = {}; a ∈ ?A2|] ==> depth ?t a = depth t2 a + 1"
by auto
have "cost ?t = weight t1 + cost t1 + weight t2 + cost t2" by simp
also have "… = weight t1 + (∑a ∈ ?A1. freq t1 a * depth t1 a) +
weight t2 + (∑a ∈ ?A2. freq t2 a * depth t2 a)"

using hyps by simp
also have "… = weight t1 + (∑a ∈ ?A1. freq t1 a * (depth ?t a - 1)) +
weight t2 + (∑a ∈ ?A2. freq t2 a * (depth ?t a - 1))"

using c d2 by simp
also have "… = weight t1 + (∑a ∈ ?A1. freq t1 a * depth ?t a)
- (∑a ∈ ?A1. freq t1 a) +
weight t2 + (∑a ∈ ?A2. freq t2 a * depth ?t a)
- (∑a ∈ ?A2. freq t2 a)"

using c d2 by (simp add: setsum_addf)
also have "… = (∑a ∈ ?A1. freq t1 a * depth ?t a) +
(∑a ∈ ?A2. freq t2 a * depth ?t a)"

using c by (simp add: weight_eq_Sum_freq)
also have "… = (∑a ∈ ?A1. freq ?t a * depth ?t a) +
(∑a ∈ ?A2. freq ?t a * depth ?t a)"

using c by auto
also have "… = (∑a ∈ ?A1 ∪ ?A2. freq ?t a * depth ?t a)"
using c by (simp add: setsum_Un_disjoint)
also have "… = (∑a ∈ ?A. freq ?t a * depth ?t a)" by simp
finally show ?case .
qed

text {*
Finally, it should come as no surprise that trees with height 0 have cost 0.
*}


lemma height_0_imp_cost_0 [simp]:
"height t = 0 ==> cost t = 0"
by (case_tac t) simp+

subsection {* Optimality *}

text {*
A tree is optimum if and only if its cost is not greater than that of any
comparable tree. We can ignore inconsistent trees without loss of generality.
*}


definition optimum :: "'a tree => bool" where
"optimum t ≡
∀u. consistent u --> alphabet t = alphabet u --> freq t = freq u -->
cost t ≤ cost u"


section {* Functional Implementation of Huffman's Algorithm
\label{implementation} *}


subsection {* Cached Weight *}

text {*
The {\sl cached weight\/} of a node is the weight stored directly in the node.
Our arguments rely on the computed weight (embodied by the @{const weight}
function) rather than the cached weight, but the implementation of Huffman's
algorithm uses the cached weight for performance reasons.
*}


primrec cachedWeight :: "'a tree => nat" where
"cachedWeight (Leaf w a) = w" |
"cachedWeight (InnerNode w t1 t2) = w"

text {*
The cached weight of a leaf node is identical to its computed weight.
*}


lemma height_0_imp_cachedWeight_eq_weight [simp]:
"height t = 0 ==> cachedWeight t = weight t"
by (case_tac t) simp+

subsection {* Tree Union *}

text {*
The implementation of Huffman's algorithm builds on two additional auxiliary
functions. The first one, @{text uniteTrees}, takes two trees
$$\vcenter{\hbox{\includegraphics[scale=1.25]{tree-w1.pdf}}}
\qquad \hbox{and} \qquad
\vcenter{\hbox{\includegraphics[scale=1.25]{tree-w2.pdf}}}$$

\noindent
and returns the tree\strut
$$\includegraphics[scale=1.25]{tree-w1-w2.pdf}$$

\vskip-.5\myskipamount
*}


definition uniteTrees :: "'a tree => 'a tree => 'a tree" where
"uniteTrees t1 t2 ≡ InnerNode (cachedWeight t1 + cachedWeight t2) t1 t2"

text {*
The alphabet, consistency, and symbol frequencies of a united tree are easy to
connect to the homologous properties of the subtrees.
*}


lemma alphabet_uniteTrees [simp]:
"alphabet (uniteTrees t1 t2) = alphabet t1 ∪ alphabet t2"
by (simp add: uniteTrees_def)

lemma consistent_uniteTrees [simp]:
"[|consistent t1; consistent t2; alphabet t1 ∩ alphabet t2 = {}|] ==>
consistent (uniteTrees t1 t2)"

by (simp add: uniteTrees_def)

lemma freq_uniteTrees [simp]:
"freq (uniteTrees t1 t2) = (λa. freq t1 a + freq t2 a)"
by (simp add: uniteTrees_def)

subsection {* Ordered Tree Insertion *}

text {*
The auxiliary function @{text insortTree} inserts a tree into a forest sorted
by cached weight, preserving the sort order.
*}


primrec insortTree :: "'a tree => 'a forest => 'a forest" where
"insortTree u [] = [u]" |
"insortTree u (t # ts) =
(if cachedWeight u ≤ cachedWeight t then u # t # ts
else t # insortTree u ts)"


text {*
The resulting forest contains one more tree than the original forest. Clearly,
it cannot be empty.
*}


lemma length_insortTree [simp]:
"length (insortTree t ts) = length ts + 1"
by (induct ts) simp+

lemma insortTree_ne_Nil [simp]:
"insortTree t ts ≠ []"
by (case_tac ts) simp+

text {*
The alphabet, consistency, symbol frequencies, and height of a forest after
insertion are easy to relate to the homologous properties of the original
forest and the inserted tree.
*}


lemma alphabetF_insortTree [simp]:
"alphabetF (insortTree t ts) = alphabet t ∪ alphabetF ts"
by (induct ts) auto

lemma consistentF_insortTree [simp]:
"consistentF (insortTree t ts) = consistentF (t # ts)"
by (induct ts) auto

lemma freqF_insortTree [simp]:
"freqF (insortTree t ts) = (λa. freq t a + freqF ts a)"
by (induct ts) (simp add: ext)+

lemma heightF_insortTree [simp]:
"heightF (insortTree t ts) = max (height t) (heightF ts)"
by (induct ts) auto

subsection {* The Main Algorithm *}

text {*
Huffman's algorithm repeatedly unites the first two trees of the forest it
receives as argument until a single tree is left. It should initially be
invoked with a list of leaf nodes sorted by weight. Note that it is not defined
for the empty list.
*}


fun huffman :: "'a forest => 'a tree" where
"huffman [t] = t" |
"huffman (t1 # t2 # ts) = huffman (insortTree (uniteTrees t1 t2) ts)"

text {*
The time complexity of the algorithm is quadratic in the size of the forest.
If we eliminated the inner node's cached weight component, and instead
recomputed the weight each time it is needed, the complexity would remain
quadratic, but with a larger constant. Using a binary search in @{const
insortTree}, the corresponding imperative algorithm is $O(n \log n)$ if we keep
the weight cache and $O(n^2)$ if we drop it. An $O(n)$ imperative implementation
is possible by maintaining two queues, one containing the unprocessed leaf nodes
and the other containing the combined trees \cite[p.~404]{knuth-1997}.

The tree returned by the algorithm preserves the alphabet, consistency, and
symbol frequencies of the original forest.
*}


theorem alphabet_huffman [simp]:
"ts ≠ [] ==> alphabet (huffman ts) = alphabetF ts"
by (induct ts rule: huffman.induct) auto

theorem consistent_huffman [simp]:
"[|consistentF ts; ts ≠ []|] ==> consistent (huffman ts)"
by (induct ts rule: huffman.induct) simp+

theorem freq_huffman [simp]:
"ts ≠ [] ==> freq (huffman ts) = freqF ts"
by (induct ts rule: huffman.induct) (auto simp: ext)

section {* Definition of Auxiliary Functions Used in the Proof
\label{auxiliary} *}


subsection {* Sibling of a Symbol *}

text {*
The {\sl sibling\/} of a symbol $a$ in a tree $t$ is the label of the node that
is the (left or right) sibling of the node labeled with $a$ in $t$. If the
symbol $a$ is not in $t$'s alphabet or it occurs in a node with no sibling
leaf, we define the sibling as being $a$ itself; this gives us the nice property
that if $t$ is consistent, then $@{term "sibling t a"} \not= a$ if and only if
$a$ has a sibling. As an illustration, we have
$@{term "sibling t a"} = b$,\vthinspace{} $@{term "sibling t b"} = a$,
and $@{term "sibling t c"} = c$ for the tree\strut
$$t \,= \vcenter{\hbox{\includegraphics[scale=1.25]{tree-sibling.pdf}}}$$
*}


fun sibling :: "'a tree => 'a => 'a" where
"sibling (Leaf wb b) a = a" |
"sibling (InnerNode w (Leaf wb b) (Leaf wc c)) a =
(if a = b then c else if a = c then b else a)"
|
"sibling (InnerNode w t1 t2) a =
(if a ∈ alphabet t1 then sibling t1 a
else if a ∈ alphabet t2 then sibling t2 a
else a)"


text {*
Because @{const sibling} is defined using sequential pattern matching
\cite{krauss-2007,krauss-2009}, reasoning about it can become tedious.
Simplification rules therefore play an important role.
*}


lemma notin_alphabet_imp_sibling_id [simp]:
"a ∉ alphabet t ==> sibling t a = a"
by (cases rule: sibling.cases [where x = "(t, a)"]) simp+

lemma height_0_imp_sibling_id [simp]:
"height t = 0 ==> sibling t a = a"
by (case_tac t) simp+

lemma height_gt_0_in_alphabet_imp_sibling_left [simp]:
"[|height t1 > 0; a ∈ alphabet t1|] ==>
sibling (InnerNode w t1 t2) a = sibling t1 a"

by (case_tac t1) simp+

lemma height_gt_0_in_alphabet_imp_sibling_right [simp]:
"[|height t2 > 0; a ∈ alphabet t1|] ==>
sibling (InnerNode w t1 t2) a = sibling t1 a"

by (case_tac t2) simp+

lemma height_gt_0_notin_alphabet_imp_sibling_left [simp]:
"[|height t1 > 0; a ∉ alphabet t1|] ==>
sibling (InnerNode w t1 t2) a = sibling t2 a"

by (case_tac t1) simp+

lemma height_gt_0_notin_alphabet_imp_sibling_right [simp]:
"[|height t2 > 0; a ∉ alphabet t1|] ==>
sibling (InnerNode w t1 t2) a = sibling t2 a"

by (case_tac t2) simp+

lemma either_height_gt_0_imp_sibling [simp]:
"height t1 > 0 ∨ height t2 > 0 ==>
sibling (InnerNode w t1 t2) a =
(if a ∈ alphabet t1 then sibling t1 a else sibling t2 a)"

by auto

text {*
The following rules are also useful for reasoning about siblings and alphabets.
*}


lemma in_alphabet_imp_sibling_in_alphabet:
"a ∈ alphabet t ==> sibling t a ∈ alphabet t"
by (induct t a rule: sibling.induct) auto

lemma sibling_ne_imp_sibling_in_alphabet:
"sibling t a ≠ a ==> sibling t a ∈ alphabet t"
by (metis notin_alphabet_imp_sibling_id in_alphabet_imp_sibling_in_alphabet)

text {*
The default induction rule for @{const sibling} distinguishes four cases.

\begin{myitemize}
\item[] {\sc Base case:}\enskip $t = @{term "Leaf w b"}$.
\item[] {\sc Induction step 1:}\enskip
$t = @{term "InnerNode w (Leaf wb b) (Leaf wc c)"}$.
\item[] {\sc Induction step 2:}\enskip
$t = @{term "InnerNode w (InnerNode w1 t11 t12) t2"}$.
\item[] {\sc Induction step 3:}\enskip
$t = @{term "InnerNode w t1 (InnerNode w2 t21 t22)"}$.
\end{myitemize}

\noindent
This rule leaves much to be desired. First, the last two cases overlap and
can normally be handled the same way, so they should be combined. Second, the
nested @{text InnerNode} constructors in the last two cases reduce readability.
Third, under the assumption that $t$ is consistent, we would like to perform
the same case distinction on $a$ as we did for
@{thm [source] tree_induct_consistent}, with the same benefits for automation.
These observations lead us to develop a custom induction rule that
distinguishes the following cases.

\begin{myitemize}
\item[] {\sc Base case:}\enskip $t = @{term "Leaf w b"}$.
\item[] {\sc Induction step 1:}\enskip
$t = @{term "InnerNode w (Leaf wb b) (Leaf wc c)"}$ with
@{prop "b ≠ c"}.
\item[] \begin{flushleft}
{\sc Induction step 2:}\enskip
$t = @{term "InnerNode w t1 t2"}$ and either @{term t1} or @{term t2}
has nonzero height.
\end{flushleft}
\item[] \noindent\kern\leftmargin {\sc Subcase 1:}\enspace $a$ belongs to
@{term t1} but not to @{term t2}.
\item[] \noindent\kern\leftmargin {\sc Subcase 2:}\enspace $a$ belongs to
@{term t2} but not to @{term t1}.
\item[] \noindent\kern\leftmargin {\sc Subcase 3:}\enspace $a$ belongs to
neither @{term t1} nor @{term t2}.
\end{myitemize}

The statement of the rule and its proof are similar to what we did for
consistent trees, the main difference being that we now have two induction
steps instead of one.
*}


lemma sibling_induct_consistent
[consumes 1, case_names base step1 step21 step22 step23]:
"[|consistent t;
!!w b a. P (Leaf w b) a;
!!w wb b wc c a. b ≠ c ==> P (InnerNode w (Leaf wb b) (Leaf wc c)) a;
!!w t1 t2 a.
[|consistent t1; consistent t2; alphabet t1 ∩ alphabet t2 = {};
height t1 > 0 ∨ height t2 > 0; a ∈ alphabet t1;
sibling t1 a ∈ alphabet t1; a ∉ alphabet t2;
sibling t1 a ∉ alphabet t2; P t1 a|] ==>
P (InnerNode w t1 t2) a;
!!w t1 t2 a.
[|consistent t1; consistent t2; alphabet t1 ∩ alphabet t2 = {};
height t1 > 0 ∨ height t2 > 0; a ∉ alphabet t1;
sibling t2 a ∉ alphabet t1; a ∈ alphabet t2;
sibling t2 a ∈ alphabet t2; P t2 a|] ==>
P (InnerNode w t1 t2) a;
!!w t1 t2 a.
[|consistent t1; consistent t2; alphabet t1 ∩ alphabet t2 = {};
height t1 > 0 ∨ height t2 > 0; a ∉ alphabet t1; a ∉ alphabet t2|] ==>
P (InnerNode w t1 t2) a|] ==>
P t a"

apply rotate_tac
apply induction_schema
apply atomize_elim
apply (case_tac t, simp)
apply clarsimp
apply (rename_tac a t1 t2)
apply (case_tac "height t1 = 0 ∧ height t2 = 0")
apply simp
apply (case_tac t1)
apply (case_tac t2)
apply fastforce
apply simp+
apply (auto intro: in_alphabet_imp_sibling_in_alphabet)[1]
by lexicographic_order

text {*
The custom induction rule allows us to prove new properties of @{const sibling}
with little effort.
*}


lemma sibling_sibling_id [simp]:
"consistent t ==> sibling t (sibling t a) = a"
by (induct t a rule: sibling_induct_consistent) simp+

lemma sibling_reciprocal:
"[|consistent t; sibling t a = b|] ==> sibling t b = a"
by auto

lemma depth_height_imp_sibling_ne:
"[|consistent t; depth t a = height t; height t > 0; a ∈ alphabet t|] ==>
sibling t a ≠ a"

by (induct t a rule: sibling_induct_consistent) auto

lemma depth_sibling [simp]:
"consistent t ==> depth t (sibling t a) = depth t a"
by (induct t a rule: sibling_induct_consistent) simp+

subsection {* Leaf Interchange *}

text {*
The @{text swapLeaves} function takes a tree $t$ together with two symbols
$a$, $b$ and their frequencies $@{term wa}$, $@{term wb}$, and returns the tree
$t$ in which the leaf nodes labeled with $a$ and $b$ are exchanged. When
invoking @{text swapLeaves}, we normally pass @{term "freq t a"} and
@{term "freq t b"} for @{term wa} and @{term wb}.

Note that we do not bother updating the cached weight of the ancestor nodes
when performing the interchange. The cached weight is used only in the
implementation of Huffman's algorithm, which doesn't invoke @{text swapLeaves}.
*}


primrec swapLeaves :: "'a tree => nat => 'a => nat => 'a => 'a tree" where
"swapLeaves (Leaf wc c) wa a wb b =
(if c = a then Leaf wb b else if c = b then Leaf wa a else Leaf wc c)"
|
"swapLeaves (InnerNode w t1 t2) wa a wb b =
InnerNode w (swapLeaves t1 wa a wb b) (swapLeaves t2 wa a wb b)"


text {*
Swapping a symbol~$a$ with itself leaves the tree $t$ unchanged if $a$ does not
belong to it or if the specified frequencies @{term wa} and @{term wb} equal
@{term "freq t a"}.
*}


lemma swapLeaves_id_when_notin_alphabet [simp]:
"a ∉ alphabet t ==> swapLeaves t w a w' a = t"
by (induct t) simp+

lemma swapLeaves_id [simp]:
"consistent t ==> swapLeaves t (freq t a) a (freq t a) a = t"
by (induct t a rule: tree_induct_consistent) simp+

text {*
The alphabet, consistency, symbol depths, height, and symbol frequencies of the
tree @{term "swapLeaves t wa a wb b"} can be related to the homologous
properties of $t$.
*}


lemma alphabet_swapLeaves:
"alphabet (swapLeaves t wa a wb b) =
(if a ∈ alphabet t then
if b ∈ alphabet t then alphabet t else (alphabet t - {a}) ∪ {b}
else
if b ∈ alphabet t then (alphabet t - {b}) ∪ {a} else alphabet t)"

by (induct t) auto

lemma consistent_swapLeaves [simp]:
"consistent t ==> consistent (swapLeaves t wa a wb b)"
by (induct t) (auto simp: alphabet_swapLeaves)

lemma depth_swapLeaves_neither [simp]:
"[|consistent t; c ≠ a; c ≠ b|] ==> depth (swapLeaves t wa a wb b) c = depth t c"
by (induct t a rule: tree_induct_consistent) (auto simp: alphabet_swapLeaves)

lemma height_swapLeaves [simp]:
"height (swapLeaves t wa a wb b) = height t"
by (induct t) simp+

lemma freq_swapLeaves [simp]:
"[|consistent t; a ≠ b|] ==>
freq (swapLeaves t wa a wb b) =
(λc. if c = a then if b ∈ alphabet t then wa else 0
else if c = b then if a ∈ alphabet t then wb else 0
else freq t c)"

apply (rule ext)
apply (induct t)
by auto

text {*
For the lemmas concerned with the resulting tree's weight and cost, we avoid
subtraction on natural numbers by rearranging terms. For example, we write
$$@{prop "weight (swapLeaves t wa a wb b) + freq t a = weight t + wb"}$$
\noindent
rather than the more conventional
$$@{prop "weight (swapLeaves t wa a wb b) = weight t + wb - freq t a"}.$$
In Isabelle/HOL, these two equations are not equivalent, because by definition
$m - n = 0$ if $n > m$. We could use the second equation and additionally
assert that @{prop "weight t ≥ freq t a"} (an easy consequence of
@{thm [source] weight_eq_Sum_freq}), and then apply the \textit{arith}
tactic, but it is much simpler to use the first equation and stay with
\textit{simp} and \textit{auto}. Another option would be to use
integers instead of natural numbers.
*}


lemma weight_swapLeaves:
"[|consistent t; a ≠ b|] ==>
if a ∈ alphabet t then
if b ∈ alphabet t then
weight (swapLeaves t wa a wb b) + freq t a + freq t b =
weight t + wa + wb
else
weight (swapLeaves t wa a wb b) + freq t a = weight t + wb
else
if b ∈ alphabet t then
weight (swapLeaves t wa a wb b) + freq t b = weight t + wa
else
weight (swapLeaves t wa a wb b) = weight t"

proof (induct t a rule: tree_induct_consistent)
-- {* {\sc Base case:}\enspace $t = @{term "Leaf w b"}$ *}
case base thus ?case by clarsimp
next
-- {* {\sc Induction step:}\enspace $t = @{term "InnerNode w t1 t2"}$ *}
-- {* {\sc Subcase 1:}\enspace $a$ belongs to @{term t1} but not to
@{term t2} *}

case (step1 w t1 t2 a) show ?case
proof cases
assume "b ∈ alphabet t1"
moreover hence "b ∉ alphabet t2" using step1 by auto
ultimately show ?case using step1 by simp
next
assume "b ∉ alphabet t1" thus ?case using step1 by auto
qed
next
-- {* {\sc Subcase 2:}\enspace $a$ belongs to @{term t2} but not to
@{term t1} *}

case (step2 w t1 t2 a) show ?case
proof cases
assume "b ∈ alphabet t1"
moreover hence "b ∉ alphabet t2" using step2 by auto
ultimately show ?case using step2 by simp
next
assume "b ∉ alphabet t1" thus ?case using step2 by auto
qed
next
-- {* {\sc Subcase 3:}\enspace $a$ belongs to neither @{term t1} nor
@{term t2} *}

case (step3 w t1 t2 a) show ?case
proof cases
assume "b ∈ alphabet t1"
moreover hence "b ∉ alphabet t2" using step3 by auto
ultimately show ?case using step3 by simp
next
assume "b ∉ alphabet t1" thus ?case using step3 by auto
qed
qed

lemma cost_swapLeaves:
"[|consistent t; a ≠ b|] ==>
if a ∈ alphabet t then
if b ∈ alphabet t then
cost (swapLeaves t wa a wb b) + freq t a * depth t a
+ freq t b * depth t b =
cost t + wa * depth t b + wb * depth t a
else
cost (swapLeaves t wa a wb b) + freq t a * depth t a =
cost t + wb * depth t a
else
if b ∈ alphabet t then
cost (swapLeaves t wa a wb b) + freq t b * depth t b =
cost t + wa * depth t b
else
cost (swapLeaves t wa a wb b) = cost t"

proof (induct t)
case Leaf show ?case by simp
next
case (InnerNode w t1 t2)
note c = `consistent (InnerNode w t1 t2)`
note hyps = InnerNode
have w1: "if a ∈ alphabet t1 then
if b ∈ alphabet t1 then
weight (swapLeaves t1 wa a wb b) + freq t1 a + freq t1 b =
weight t1 + wa + wb
else
weight (swapLeaves t1 wa a wb b) + freq t1 a = weight t1 + wb
else
if b ∈ alphabet t1 then
weight (swapLeaves t1 wa a wb b) + freq t1 b = weight t1 + wa
else
weight (swapLeaves t1 wa a wb b) = weight t1"
using hyps
by (simp add: weight_swapLeaves)
have w2: "if a ∈ alphabet t2 then
if b ∈ alphabet t2 then
weight (swapLeaves t2 wa a wb b) + freq t2 a + freq t2 b =
weight t2 + wa + wb
else
weight (swapLeaves t2 wa a wb b) + freq t2 a = weight t2 + wb
else
if b ∈ alphabet t2 then
weight (swapLeaves t2 wa a wb b) + freq t2 b = weight t2 + wa
else
weight (swapLeaves t2 wa a wb b) = weight t2"
using hyps
by (simp add: weight_swapLeaves)
show ?case
proof cases
assume a1: "a ∈ alphabet t1"
hence a2: "a ∉ alphabet t2" using c by auto
show ?case
proof cases
assume b1: "b ∈ alphabet t1"
hence "b ∉ alphabet t2" using c by auto
thus ?case using a1 a2 b1 w1 w2 hyps by simp
next
assume b1: "b ∉ alphabet t1" show ?case
proof cases
assume "b ∈ alphabet t2" thus ?case using a1 a2 b1 w1 w2 hyps by simp
next
assume "b ∉ alphabet t2" thus ?case using a1 a2 b1 w1 w2 hyps by simp
qed
qed
next
assume a1: "a ∉ alphabet t1" show ?case
proof cases
assume a2: "a ∈ alphabet t2" show ?case
proof cases
assume b1: "b ∈ alphabet t1"
hence "b ∉ alphabet t2" using c by auto
thus ?case using a1 a2 b1 w1 w2 hyps by simp
next
assume b1: "b ∉ alphabet t1" show ?case
proof cases
assume "b ∈ alphabet t2" thus ?case using a1 a2 b1 w1 w2 hyps by simp
next
assume "b ∉ alphabet t2" thus ?case using a1 a2 b1 w1 w2 hyps by simp
qed
qed
next
assume a2: "a ∉ alphabet t2" show ?case
proof cases
assume b1: "b ∈ alphabet t1"
hence "b ∉ alphabet t2" using c by auto
thus ?case using a1 a2 b1 w1 w2 hyps by simp
next
assume b1: "b ∉ alphabet t1" show ?case
proof cases
assume "b ∈ alphabet t2" thus ?case using a1 a2 b1 w1 w2 hyps by simp
next
assume "b ∉ alphabet t2" thus ?case using a1 a2 b1 w1 w2 hyps by simp
qed
qed
qed
qed
qed

text {*
Common sense tells us that the following statement is valid: ``If Astrid
exchanges her house with Bernard's neighbor, Bernard becomes Astrid's new
neighbor.'' A similar property holds for binary trees.
*}


lemma sibling_swapLeaves_sibling [simp]:
"[|consistent t; sibling t b ≠ b; a ≠ b|] ==>
sibling (swapLeaves t wa a ws (sibling t b)) a = b"

proof (induct t)
case Leaf thus ?case by simp
next
case (InnerNode w t1 t2)
note hyps = InnerNode
show ?case
proof (cases "height t1 = 0")
case True
note h1 = True
show ?thesis
proof (cases t1)
case (Leaf wc c)
note l1 = Leaf
show ?thesis
proof (cases "height t2 = 0")
case True
note h2 = True
show ?thesis
proof (cases t2)
case Leaf thus ?thesis using l1 hyps by auto metis+
next
case InnerNode thus ?thesis using h2 by simp
qed
next
case False
note h2 = False
show ?thesis
proof cases
assume "c = b" thus ?thesis using l1 h2 hyps by simp
next
assume "c ≠ b"
have "sibling t2 b ∈ alphabet t2" using `c ≠ b` l1 h2 hyps
by (simp add: sibling_ne_imp_sibling_in_alphabet)
thus ?thesis using `c ≠ b` l1 h2 hyps by auto
qed
qed
next
case InnerNode thus ?thesis using h1 by simp
qed
next
case False
note h1 = False
show ?thesis
proof (cases "height t2 = 0")
case True
note h2 = True
show ?thesis
proof (cases t2)
case (Leaf wd d)
note l2 = Leaf
show ?thesis
proof cases
assume "d = b" thus ?thesis using h1 l2 hyps by simp
next
assume "d ≠ b" show ?thesis
proof (cases "b ∈ alphabet t1")
case True
hence "sibling t1 b ∈ alphabet t1" using `d ≠ b` h1 l2 hyps
by (simp add: sibling_ne_imp_sibling_in_alphabet)
thus ?thesis using True `d ≠ b` h1 l2 hyps
by (simp add: alphabet_swapLeaves)
next
case False thus ?thesis using `d ≠ b` l2 hyps by simp
qed
qed
next
case InnerNode thus ?thesis using h2 by simp
qed
next
case False
note h2 = False
show ?thesis
proof (cases "b ∈ alphabet t1")
case True thus ?thesis using h1 h2 hyps by auto
next
case False
note b1 = False
show ?thesis
proof (cases "b ∈ alphabet t2")
case True thus ?thesis using b1 h1 h2 hyps
by (auto simp: in_alphabet_imp_sibling_in_alphabet
alphabet_swapLeaves)
next
case False thus ?thesis using b1 h1 h2 hyps by simp
qed
qed
qed
qed
qed

subsection {* Symbol Interchange *}

text {*
The @{text swapSyms} function provides a simpler interface to
@{const swapLeaves}, with @{term "freq t a"} and @{term "freq t b"} in place of
@{term "wa"} and @{term "wb"}. Most lemmas about @{text swapSyms} are directly
adapted from the homologous results about @{const swapLeaves}.
*}


definition swapSyms :: "'a tree => 'a => 'a => 'a tree" where
"swapSyms t a b ≡ swapLeaves t (freq t a) a (freq t b) b"

lemma swapSyms_id [simp]:
"consistent t ==> swapSyms t a a = t"
by (simp add: swapSyms_def)

lemma alphabet_swapSyms [simp]:
"[|a ∈ alphabet t; b ∈ alphabet t|] ==> alphabet (swapSyms t a b) = alphabet t"
by (simp add: swapSyms_def alphabet_swapLeaves)

lemma consistent_swapSyms [simp]:
"consistent t ==> consistent (swapSyms t a b)"
by (simp add: swapSyms_def)

lemma depth_swapSyms_neither [simp]:
"[|consistent t; c ≠ a; c ≠ b|] ==>
depth (swapSyms t a b) c = depth t c"

by (simp add: swapSyms_def)

lemma freq_swapSyms [simp]:
"[|consistent t; a ∈ alphabet t; b ∈ alphabet t|] ==>
freq (swapSyms t a b) = freq t"

by (case_tac "a = b") (simp add: swapSyms_def ext)+

lemma cost_swapSyms:
assumes "consistent t" "a ∈ alphabet t" "b ∈ alphabet t"
shows "cost (swapSyms t a b) + freq t a * depth t a + freq t b * depth t b =
cost t + freq t a * depth t b + freq t b * depth t a"

proof cases
assume "a = b" thus ?thesis using assms by simp
next
assume "a ≠ b"
moreover hence "cost (swapLeaves t (freq t a) a (freq t b) b)
+ freq t a * depth t a + freq t b * depth t b =
cost t + freq t a * depth t b + freq t b * depth t a"

using assms by (simp add: cost_swapLeaves)
ultimately show ?thesis using assms by (simp add: swapSyms_def)
qed

text {*
If $a$'s frequency is lower than or equal to $b$'s, and $a$ is higher up in the
tree than $b$ or at the same level, then interchanging $a$ and $b$ does not
increase the tree's cost.
*}


lemma le_le_imp_sum_mult_le_sum_mult:
"[|i ≤ j; m ≤ (n::nat)|] ==> i * n + j * m ≤ i * m + j * n"
apply (subgoal_tac "i * m + i * (n - m) + j * m ≤ i * m + j * m + j * (n - m)")
apply (simp add: diff_mult_distrib2)
by simp

lemma cost_swapSyms_le:
assumes "consistent t" "a ∈ alphabet t" "b ∈ alphabet t" "freq t a ≤ freq t b"
"depth t a ≤ depth t b"
shows "cost (swapSyms t a b) ≤ cost t"
proof -
let ?aabb = "freq t a * depth t a + freq t b * depth t b"
let ?abba = "freq t a * depth t b + freq t b * depth t a"
have "?abba ≤ ?aabb" using assms(4-5)
by (rule le_le_imp_sum_mult_le_sum_mult)
have "cost (swapSyms t a b) + ?aabb = cost t + ?abba" using assms(1-3)
by (simp add: cost_swapSyms nat_add_assoc [THEN sym])
also have "… ≤ cost t + ?aabb" using `?abba ≤ ?aabb` by simp
finally show ?thesis using assms(4-5) by simp
qed

text {*
As stated earlier, ``If Astrid exchanges her house with Bernard's neighbor,
Bernard becomes Astrid's new neighbor.''
*}


lemma sibling_swapSyms_sibling [simp]:
"[|consistent t; sibling t b ≠ b; a ≠ b|] ==>
sibling (swapSyms t a (sibling t b)) a = b"

by (simp add: swapSyms_def)

text {*
``If Astrid exchanges her house with Bernard, Astrid becomes Bernard's old
neighbor's new neighbor.''
*}


lemma sibling_swapSyms_other_sibling [simp]:
"[|consistent t; sibling t b ≠ a; sibling t b ≠ b; a ≠ b|] ==>
sibling (swapSyms t a b) (sibling t b) = a"

by (metis consistent_swapSyms sibling_swapSyms_sibling sibling_reciprocal)

subsection {* Four-Way Symbol Interchange
\label{four-way-symbol-interchange} *}


text {*
The @{const swapSyms} function exchanges two symbols $a$ and $b$. We use it
to define the four-way symbol interchange function @{text swapFourSyms}, which
takes four symbols $a$, $b$, $c$, $d$ with $a \ne b$ and $c \ne d$, and
exchanges them so that $a$ and $b$ occupy $c$~and~$d$'s positions.

A naive definition of this function would be
$$@{prop "swapFourSyms t a b c d ≡ swapSyms (swapSyms t a c) b d"}.$$
This definition fails in the face of aliasing: If $a = d$, but
$b \ne c$, then @{text "swapFourSyms a b c d"} would leave $a$ in $b$'s
position.%
\footnote{Cormen et al.\ \cite[p.~390]{cormen-et-al-2001} forgot to consider
this case in their proof. Thomas Cormen indicated in a personal communication
that this will be corrected in the next edition of the book.}
*}


definition swapFourSyms :: "'a tree => 'a => 'a => 'a => 'a => 'a tree" where
"swapFourSyms t a b c d ≡
if a = d then swapSyms t b c
else if b = c then swapSyms t a d
else swapSyms (swapSyms t a c) b d"


text {*
Lemmas about @{const swapFourSyms} are easy to prove by expanding its
definition.
*}


lemma alphabet_swapFourSyms [simp]:
"[|a ∈ alphabet t; b ∈ alphabet t; c ∈ alphabet t; d ∈ alphabet t|] ==>
alphabet (swapFourSyms t a b c d) = alphabet t"

by (simp add: swapFourSyms_def)

lemma consistent_swapFourSyms [simp]:
"consistent t ==> consistent (swapFourSyms t a b c d)"
by (simp add: swapFourSyms_def)

lemma freq_swapFourSyms [simp]:
"[|consistent t; a ∈ alphabet t; b ∈ alphabet t; c ∈ alphabet t;
d ∈ alphabet t|] ==>
freq (swapFourSyms t a b c d) = freq t"

by (auto simp: swapFourSyms_def)

text {*
More Astrid and Bernard insanity: ``If Astrid and Bernard exchange their houses
with Carmen and her neighbor, Astrid and Bernard will now be neighbors.''
*}


lemma sibling_swapFourSyms_when_4th_is_sibling:
assumes "consistent t" "a ∈ alphabet t" "b ∈ alphabet t" "c ∈ alphabet t"
"a ≠ b" "sibling t c ≠ c"
shows "sibling (swapFourSyms t a b c (sibling t c)) a = b"
proof (cases "a ≠ sibling t c ∧ b ≠ c")
case True show ?thesis
proof -
let ?d = "sibling t c"
let ?ts = "swapFourSyms t a b c ?d"
have abba: "(sibling ?ts a = b) = (sibling ?ts b = a)" using `consistent t`
by (metis consistent_swapFourSyms sibling_reciprocal)
have s: "sibling t c = sibling (swapSyms t a c) a" using True assms
by (metis sibling_reciprocal sibling_swapSyms_sibling)
have "sibling ?ts b = sibling (swapSyms t a c) ?d" using s True assms
by (auto simp: swapFourSyms_def)
also have "… = a" using True assms
by (metis sibling_reciprocal sibling_swapSyms_other_sibling
swapLeaves_id swapSyms_def)
finally have "sibling ?ts b = a" .
with abba show ?thesis ..
qed
next
case False thus ?thesis using assms
by (auto intro: sibling_reciprocal simp: swapFourSyms_def)
qed

subsection {* Sibling Merge *}

text {*
Given a symbol $a$, the @{text mergeSibling} function transforms the tree
%
\setbox\myboxi=\hbox{\includegraphics[scale=1.25]{tree-splitLeaf-a.pdf}}%
\setbox\myboxii=\hbox{\includegraphics[scale=1.25]{tree-splitLeaf-ab.pdf}}%
\mydimeni=\ht\myboxii
$$\vcenter{\box\myboxii}
\qquad \hbox{into} \qquad
\smash{\lower\ht\myboxi\hbox{\raise.5\mydimeni\box\myboxi}}$$
The frequency of $a$ in the result is the sum of the original frequencies of
$a$ and $b$, so as not to alter the tree's weight.
*}


fun mergeSibling :: "'a tree => 'a => 'a tree" where
"mergeSibling (Leaf wb b) a = Leaf wb b" |
"mergeSibling (InnerNode w (Leaf wb b) (Leaf wc c)) a =
(if a = b ∨ a = c then Leaf (wb + wc) a
else InnerNode w (Leaf wb b) (Leaf wc c))"
|
"mergeSibling (InnerNode w t1 t2) a =
InnerNode w (mergeSibling t1 a) (mergeSibling t2 a)"


text {*
The definition of @{const mergeSibling} has essentially the same structure as
that of @{const sibling}. As a result, the custom induction rule that we
derived for @{const sibling} works equally well for reasoning about
@{const mergeSibling}.
*}


lemmas mergeSibling_induct_consistent = sibling_induct_consistent

text {*
The properties of @{const mergeSibling} echo those of @{const sibling}. Like
with @{const sibling}, simplification rules are crucial.
*}


lemma notin_alphabet_imp_mergeSibling_id [simp]:
"a ∉ alphabet t ==> mergeSibling t a = t"
by (induct t a rule: mergeSibling.induct) simp+

lemma height_gt_0_imp_mergeSibling_left [simp]:
"height t1 > 0 ==>
mergeSibling (InnerNode w t1 t2) a =
InnerNode w (mergeSibling t1 a) (mergeSibling t2 a)"

by (case_tac t1) simp+

lemma height_gt_0_imp_mergeSibling_right [simp]:
"height t2 > 0 ==>
mergeSibling (InnerNode w t1 t2) a =
InnerNode w (mergeSibling t1 a) (mergeSibling t2 a)"

by (case_tac t2) simp+

lemma either_height_gt_0_imp_mergeSibling [simp]:
"height t1 > 0 ∨ height t2 > 0 ==>
mergeSibling (InnerNode w t1 t2) a =
InnerNode w (mergeSibling t1 a) (mergeSibling t2 a)"

by auto

lemma alphabet_mergeSibling [simp]:
"[|consistent t; a ∈ alphabet t|] ==>
alphabet (mergeSibling t a) = (alphabet t - {sibling t a}) ∪ {a}"

by (induct t a rule: mergeSibling_induct_consistent) auto

lemma consistent_mergeSibling [simp]:
"consistent t ==> consistent (mergeSibling t a)"
by (induct t a rule: mergeSibling_induct_consistent) auto

lemma freq_mergeSibling:
"[|consistent t; a ∈ alphabet t; sibling t a ≠ a|] ==>
freq (mergeSibling t a) =
(λc. if c = a then freq t a + freq t (sibling t a)
else if c = sibling t a then 0
else freq t c)"

by (induct t a rule: mergeSibling_induct_consistent)
(auto simp: fun_eq_iff)

lemma weight_mergeSibling [simp]:
"weight (mergeSibling t a) = weight t"
by (induct t a rule: mergeSibling.induct) simp+

text {*
If $a$ has a sibling, merging $a$ and its sibling reduces $t$'s cost by
@{term "freq t a + freq t (sibling t a)"}.
*}


lemma cost_mergeSibling:
"[|consistent t; sibling t a ≠ a|] ==>
cost (mergeSibling t a) + freq t a + freq t (sibling t a) = cost t"

by (induct t a rule: mergeSibling_induct_consistent) auto

subsection {* Leaf Split *}

text {*
The @{text splitLeaf} function undoes the merging performed by
@{const mergeSibling}: Given two symbols $a$, $b$ and two frequencies
$@{term wa}$, $@{term wb}$, it transforms
\setbox\myboxi=\hbox{\includegraphics[scale=1.25]{tree-splitLeaf-a.pdf}}%
\setbox\myboxii=\hbox{\includegraphics[scale=1.25]{tree-splitLeaf-ab.pdf}}%
$$\smash{\lower\ht\myboxi\hbox{\raise.5\ht\myboxii\box\myboxi}}
\qquad \hbox{into} \qquad
\vcenter{\box\myboxii}$$
In the resulting tree, $a$ has frequency @{term wa} and $b$ has frequency
@{term wb}. We normally invoke it with @{term wa}~and @{term wb} such that
@{prop "freq t a = wa + wb"}.
*}


primrec splitLeaf :: "'a tree => nat => 'a => nat => 'a => 'a tree" where
"splitLeaf (Leaf wc c) wa a wb b =
(if c = a then InnerNode wc (Leaf wa a) (Leaf wb b) else Leaf wc c)"
|
"splitLeaf (InnerNode w t1 t2) wa a wb b =
InnerNode w (splitLeaf t1 wa a wb b) (splitLeaf t2 wa a wb b)"


primrec splitLeafF :: "'a forest => nat => 'a => nat => 'a => 'a forest" where
"splitLeafF [] wa a wb b = []" |
"splitLeafF (t # ts) wa a wb b =
splitLeaf t wa a wb b # splitLeafF ts wa a wb b"


text {*
Splitting leaf nodes affects the alphabet, consistency, symbol frequencies,
weight, and cost in unsurprising ways.
*}


lemma notin_alphabet_imp_splitLeaf_id [simp]:
"a ∉ alphabet t ==> splitLeaf t wa a wb b = t"
by (induct t) simp+

lemma notin_alphabetF_imp_splitLeafF_id [simp]:
"a ∉ alphabetF ts ==> splitLeafF ts wa a wb b = ts"
by (induct ts) simp+

lemma alphabet_splitLeaf [simp]:
"alphabet (splitLeaf t wa a wb b) =
(if a ∈ alphabet t then alphabet t ∪ {b} else alphabet t)"

by (induct t) simp+

lemma consistent_splitLeaf [simp]:
"[|consistent t; b ∉ alphabet t|] ==> consistent (splitLeaf t wa a wb b)"
by (induct t) auto

lemma freq_splitLeaf [simp]:
"[|consistent t; b ∉ alphabet t|] ==>
freq (splitLeaf t wa a wb b) =
(if a ∈ alphabet t then
(λc. if c = a then wa else if c = b then wb else freq t c)
else
freq t)"

by (induct t b rule: tree_induct_consistent) (rule ext, auto)+

lemma weight_splitLeaf [simp]:
"[|consistent t; a ∈ alphabet t; freq t a = wa + wb|] ==>
weight (splitLeaf t wa a wb b) = weight t"

by (induct t a rule: tree_induct_consistent) simp+

lemma cost_splitLeaf [simp]:
"[|consistent t; a ∈ alphabet t; freq t a = wa + wb|] ==>
cost (splitLeaf t wa a wb b) = cost t + wa + wb"

by (induct t a rule: tree_induct_consistent) simp+

subsection {* Weight Sort Order *}

text {*
An invariant of Huffman's algorithm is that the forest is sorted by weight.
This is expressed by the @{text sortedByWeight} function.
*}


fun sortedByWeight :: "'a forest => bool" where
"sortedByWeight [] = True" |
"sortedByWeight [t] = True" |
"sortedByWeight (t1 # t2 # ts) =
(weight t1 ≤ weight t2 ∧ sortedByWeight (t2 # ts))"


text {*
The function obeys the following fairly obvious laws.
*}


lemma sortedByWeight_Cons_imp_sortedByWeight:
"sortedByWeight (t # ts) ==> sortedByWeight ts"
by (case_tac ts) simp+

lemma sortedByWeight_Cons_imp_forall_weight_ge:
"sortedByWeight (t # ts) ==> ∀u ∈ set ts. weight u ≥ weight t"
proof (induct ts arbitrary: t)
case Nil thus ?case by simp
next
case (Cons u us) thus ?case by simp (metis le_trans)
qed

lemma sortedByWeight_insortTree:
"[|sortedByWeight ts; height t = 0; heightF ts = 0|] ==>
sortedByWeight (insortTree t ts)"

by (induct ts rule: sortedByWeight.induct) auto

subsection {* Pair of Minimal Symbols *}

text {*
The @{text minima} predicate expresses that two symbols
$a$, $b \in @{term "alphabet t"}$ have the lowest frequencies in the tree $t$
and that @{prop "freq t a ≤ freq t b"}. Minimal symbols need not be uniquely
defined.
*}


definition minima :: "'a tree => 'a => 'a => bool" where
"minima t a b ≡
a ∈ alphabet t ∧ b ∈ alphabet t ∧ a ≠ b ∧ freq t a ≤ freq t b
∧ (∀c ∈ alphabet t. c ≠ a --> c ≠ b -->
freq t c ≥ freq t a ∧ freq t c ≥ freq t b)"


section {* Formalization of the Textbook Proof
\label{formalization} *}


subsection {* Four-Way Symbol Interchange Cost Lemma *}

text {*
If $a$ and $b$ are minima, and $c$ and $d$ are at the very bottom of the tree,
then exchanging $a$ and $b$ with $c$ and $d$ doesn't increase the cost.
Graphically, we have\strut
%
$${\it cost\/}
\vcenter{\hbox{\includegraphics[scale=1.25]{tree-minima-abcd.pdf}}}
\,\mathop{\le}\;\;\;
{\it cost\/}
\vcenter{\hbox{\includegraphics[scale=1.25]{tree-minima.pdf}}}$$

\noindent
This cost property is part of Knuth's proof:

\begin{quote}
Let $V$ be an internal node of maximum distance from the root. If $w_1$ and
$w_2$ are not the weights already attached to the children of $V$, we can
interchange them with the values that are already there; such an interchange
does not increase the weighted path length.
\end{quote}

\noindent
Lemma~16.2 in Cormen et al.~\cite[p.~389]{cormen-et-al-2001} expresses a
similar property, which turns out to be a corollary of our cost property:

\begin{quote}
Let $C$ be an alphabet in which each character $c \in C$ has frequency $f[c]$.
Let $x$ and $y$ be two characters in $C$ having the lowest frequencies. Then
there exists an optimal prefix code for $C$ in which the codewords for $x$ and
$y$ have the same length and differ only in the last bit.
\end{quote}

\vskip-.75\myskipamount
*}


lemma cost_swapFourSyms_le:
assumes "consistent t" "minima t a b" "c ∈ alphabet t" "d ∈ alphabet t"
"depth t c = height t" "depth t d = height t" "c ≠ d"
shows "cost (swapFourSyms t a b c d) ≤ cost t"
proof -
note lems = swapFourSyms_def minima_def cost_swapSyms_le depth_le_height
show ?thesis
proof (cases "a ≠ d ∧ b ≠ c")
case True show ?thesis
proof cases
assume "a = c" show ?thesis
proof cases
assume "b = d" thus ?thesis using `a = c` True assms
by (simp add: lems)
next
assume "b ≠ d" thus ?thesis using `a = c` True assms
by (simp add: lems)
qed
next
assume "a ≠ c" show ?thesis
proof cases
assume "b = d" thus ?thesis using `a ≠ c` True assms
by (simp add: lems)
next
assume "b ≠ d"
have "cost (swapFourSyms t a b c d) ≤ cost (swapSyms t a c)"
using `b ≠ d` `a ≠ c` True assms by (clarsimp simp: lems)
also have "… ≤ cost t" using `b ≠ d` `a ≠ c` True assms
by (clarsimp simp: lems)
finally show ?thesis .
qed
qed
next
case False thus ?thesis using assms by (clarsimp simp: lems)
qed
qed

subsection {* Leaf Split Optimality Lemma
\label{leaf-split-optimality} *}


text {*
The tree @{term "splitLeaf t wa a wb b"} is optimum if $t$ is optimum, under a
few assumptions, notably that $a$ and $b$ are minima of the new tree and
that @{prop "freq t a = wa + wb"}.
Graphically:\strut
%
\setbox\myboxi=\hbox{\includegraphics[scale=1.2]{tree-splitLeaf-a.pdf}}%
\setbox\myboxii=\hbox{\includegraphics[scale=1.2]{tree-splitLeaf-ab.pdf}}%
$${\it optimum\/} \smash{\lower\ht\myboxi\hbox{\raise.5\ht\myboxii\box\myboxi}}
\,\mathop{\Longrightarrow}\;\;\;
{\it optimum\/} \vcenter{\box\myboxii}$$
%
This corresponds to the following fragment of Knuth's proof:

\begin{quote}
Now it is easy to prove that the weighted path length of such a tree is
minimized if and only if the tree with
$$\vcenter{\hbox{\includegraphics[scale=1.25]{tree-w1-w2-leaves.pdf}}}
\qquad \hbox{replaced by} \qquad
\vcenter{\hbox{\includegraphics[scale=1.25]{tree-w1-plus-w2.pdf}}}$$
has minimum path length for the weights $w_1 + w_2$, $w_3$, $\ldots\,$, $w_m$.
\end{quote}

\noindent
We only need the ``if'' direction of Knuth's equivalence. Lemma~16.3 in
Cormen et al.~\cite[p.~391]{cormen-et-al-2001} expresses essentially the same
property:

\begin{quote}
Let $C$ be a given alphabet with frequency $f[c]$ defined for each character
$c \in C$. Let $x$ and $y$ be two characters in $C$ with minimum frequency. Let
$C'$ be the alphabet $C$ with characters $x$, $y$ removed and (new) character
$z$ added, so that $C' = C - \{x, y\} \cup {\{z\}}$; define $f$ for $C'$ as for
$C$, except that $f[z] = f[x] + f[y]$. Let $T'$ be any tree representing an
optimal prefix code for the alphabet $C'$. Then the tree $T$, obtained from
$T'$ by replacing the leaf node for $z$ with an internal node having $x$ and
$y$ as children, represents an optimal prefix code for the alphabet $C$.
\end{quote}

\noindent
The proof is as follows: We assume that $t$ has a cost less than or equal to
that of any other comparable tree~$v$ and show that
@{term "splitLeaf t wa a wb b"} has a cost less than or equal to that of any
other comparable tree $u$. By @{thm [source] exists_at_height} and
@{thm [source] depth_height_imp_sibling_ne}, we know that some symbols $c$ and
$d$ appear in sibling nodes at the very bottom of~$u$:
$$\includegraphics[scale=1.25]{tree-splitLeaf-cd.pdf}$$
(The question mark is there to remind us that we know nothing specific about
$u$'s structure.) From $u$ we construct a new tree
@{term "swapFourSyms u a b c d"} in which the minima $a$ and $b$ are siblings:
$$\includegraphics[scale=1.25]{tree-splitLeaf-abcd.pdf}$$
Merging $a$ and $b$ gives a tree comparable with $t$, which we can use to
instantiate $v$ in the assumption:
$$\includegraphics[scale=1.25]{tree-splitLeaf-abcd-aba.pdf}$$
With this instantiation, the proof is easy:
$$\begin{tabularx}{\textwidth}{@%
{\hskip\leftmargin}cX@%
{}}
& @{term "cost (splitLeaf t a wa b wb)"} \\
\eq & \justif{@{thm [source] cost_splitLeaf}} \\
& @{term "cost t + wa + wb"} \\
\kern-1em$\leq$\kern-1em & \justif{assumption} \\[-2ex]
& $@{text "cost ("}
\overbrace{\strut\!@{term "mergeSibling (swapFourSyms u a b c d) a"}\!}
^{\smash{\hbox{$v$}}}@{text ") + wa + wb"}$ \\[\extrah]
\eq & \justif{@{thm [source] cost_mergeSibling}} \\
& @{term "cost (swapFourSyms u a b c d)"} \\
\kern-1em$\leq$\kern-1em & \justif{@{thm [source] cost_swapFourSyms_le}} \\
& @{term "cost u"}. \\
\end{tabularx}$$

\noindent
In contrast, the proof in Cormen et al.\ is by contradiction: Essentially, they
assume that there exists a tree $u$ with a lower cost than
@{term "splitLeaf t a wa b wb"} and show that there exists a tree~$v$
with a lower cost than~$t$, contradicting the hypothesis that $t$ is optimum. In
place of @{thm [source] cost_swapFourSyms_le}, they invoke their lemma~16.2,
which is questionable since $u$ is not necessarily optimum.%
\footnote{Thomas Cormen commented that this step will be clarified in the
next edition of the book.}

Our proof relies on the following lemma, which asserts that $a$ and $b$ are
minima of $u$.
*}


lemma twice_freq_le_imp_minima:
"[|∀c ∈ alphabet t. wa ≤ freq t c ∧ wb ≤ freq t c;
alphabet u = alphabet t ∪ {b}; a ∈ alphabet u; a ≠ b;
freq u = (λc. if c = a then wa else if c = b then wb else freq t c);
wa ≤ wb|] ==>
minima u a b"

by (simp add: minima_def)

text {*
Now comes the key lemma.
*}


lemma optimum_splitLeaf:
assumes "consistent t" "optimum t" "a ∈ alphabet t" "b ∉ alphabet t"
"freq t a = wa + wb" "∀c ∈ alphabet t. freq t c ≥ wa ∧ freq t c ≥ wb"
"wa ≤ wb"
shows "optimum (splitLeaf t wa a wb b)"
proof (unfold optimum_def, clarify)
fix u
let ?t' = "splitLeaf t wa a wb b"
assume cu: "consistent u"
and au: "alphabet ?t' = alphabet u"
and fu: "freq ?t' = freq u"
show "cost ?t' ≤ cost u"
proof (cases "height ?t' = 0")
case True thus ?thesis by simp
next
case False
hence hu: "height u > 0" using au assms
by (auto intro: height_gt_0_alphabet_eq_imp_height_gt_0)
have aa: "a ∈ alphabet u" using au assms by fastforce
have ab: "b ∈ alphabet u" using au assms by fastforce
have ab: "a ≠ b" using assms by blast
from exists_at_height [OF cu]
obtain c where ac: "c ∈ alphabet u" and dc: "depth u c = height u" ..
let ?d = "sibling u c"
have dc: "?d ≠ c" using dc cu hu ac by (metis depth_height_imp_sibling_ne)
have ad: "?d ∈ alphabet u" using dc
by (rule sibling_ne_imp_sibling_in_alphabet)
have dd: "depth u ?d = height u" using dc cu by simp

let ?u' = "swapFourSyms u a b c ?d"
have cu': "consistent ?u'" using cu by simp
have au': "alphabet ?u' = alphabet u" using aa ab ac ad au by simp
have fu': "freq ?u' = freq u" using aa ab ac ad cu fu by simp
have sa: "sibling ?u' a = b" using cu aa ab ac ab dc
by (rule sibling_swapFourSyms_when_4th_is_sibling)

let ?v = "mergeSibling ?u' a"
have cv: "consistent ?v" using cu' by simp
have av: "alphabet ?v = alphabet t" using sa cu' au' aa au assms by auto
have fv: "freq ?v = freq t"
using sa cu' au' fu' fu [THEN sym] ab au [THEN sym] assms
by (simp add: freq_mergeSibling ext)

have "cost ?t' = cost t + wa + wb" using assms by simp
also have "… ≤ cost ?v + wa + wb" using cv av fv `optimum t`
by (simp add: optimum_def)
also have "… = cost ?u'"
proof -
have "cost ?v + freq ?u' a + freq ?u' (sibling ?u' a) = cost ?u'"
using cu' sa assms by (subst cost_mergeSibling) auto
moreover have "wa = freq ?u' a" "wb = freq ?u' b"
using fu' fu [THEN sym] assms by clarsimp+
ultimately show ?thesis using sa by simp
qed
also have "… ≤ cost u"
proof -
have "minima u a b" using au fu assms
by (subst twice_freq_le_imp_minima) auto
with cu show ?thesis using ac ad dc dd dc [THEN not_sym]
by (rule cost_swapFourSyms_le)
qed
finally show ?thesis .
qed
qed

subsection {* Leaf Split Commutativity Lemma
\label{leaf-split-commutativity} *}


text {*
A key property of Huffman's algorithm is that once it has combined two
lowest-weight trees using @{const uniteTrees}, it doesn't visit these trees
ever again. This suggests that splitting a leaf node before applying the
algorithm should give the same result as applying the algorithm first and
splitting the leaf node afterward. The diagram below illustrates the
situation:\strut

\def\myscale{1.05}%
\setbox\myboxi=\hbox{(9)\strut}%
\setbox\myboxii=\hbox{\includegraphics[scale=\myscale]{forest-a.pdf}}%
$$(1)\,\lower\ht\myboxii\hbox{\raise\ht\myboxi\box\myboxii}$$

\smallskip

\setbox\myboxii=\hbox{\includegraphics[scale=\myscale]{tree-splitLeaf-a.pdf}}%
\setbox\myboxiii=\hbox{\includegraphics[scale=\myscale]%
{forest-splitLeaf-ab.pdf}}%
\mydimeni=\wd\myboxii

\noindent
(2a)\,\lower\ht\myboxii\hbox{\raise\ht\myboxi\box\myboxii}%
\qquad\qquad\quad
(2b)\,\lower\ht\myboxiii\hbox{\raise\ht\myboxi\box\myboxiii}\quad{}

\setbox\myboxiii=\hbox{\includegraphics[scale=\myscale]%
{tree-splitLeaf-ab.pdf}}%
\setbox\myboxiv=\hbox{\includegraphics[scale=\myscale]%
{tree-huffman-splitLeaf-ab.pdf}}%
\mydimenii=\wd\myboxiii
\vskip1.5\smallskipamount
\noindent
(3a)\,\lower\ht\myboxiii\hbox{\raise\ht\myboxi\box\myboxiii}%
\qquad\qquad\quad
(3b)\,\hfill\lower\ht\myboxiv\hbox{\raise\ht\myboxi\box\myboxiv}%
\quad\hfill{}

\noindent
From the original forest (1), we can either run the algorithm (2a) and then
split $a$ (3a) or split $a$ (2b) and then run the algorithm (3b). Our goal is
to show that trees (3a) and (3b) are identical. Formally, we prove that
$$@{term "splitLeaf (huffman ts) wa a wb b"} =
@{term "huffman (splitLeafF ts wa a wb b)"}$$
when @{term ts} is consistent, @{term "a ∈ alphabetF ts"}, @{term
"b ∉ alphabetF ts"}, and $@{term "freqF ts a"} = @{term wa}
\mathbin{@{text "+"}} @{term "wb"}$. But before we can prove this
commutativity lemma, we need to introduce a few simple lemmas.
*}


lemma cachedWeight_splitLeaf [simp]:
"cachedWeight (splitLeaf t wa a wb b) = cachedWeight t"
by (case_tac t) simp+

lemma splitLeafF_insortTree_when_in_alphabet_left [simp]:
"[|a ∈ alphabet t; consistent t; a ∉ alphabetF ts; freq t a = wa + wb|] ==>
splitLeafF (insortTree t ts) wa a wb b = insortTree (splitLeaf t wa a wb b) ts"

by (induct ts) simp+

lemma splitLeafF_insortTree_when_in_alphabetF_tail [simp]:
"[|a ∈ alphabetF ts; consistentF ts; a ∉ alphabet t; freqF ts a = wa + wb|] ==>
splitLeafF (insortTree t ts) wa a wb b =
insortTree t (splitLeafF ts wa a wb b)"

proof (induct ts)
case Nil thus ?case by simp
next
case (Cons u us) show ?case
proof (cases "a ∈ alphabet u")
case True
moreover hence "a ∉ alphabetF us" using Cons by auto
ultimately show ?thesis using Cons by auto
next
case False thus ?thesis using Cons by simp
qed
qed

text {*
We are now ready to prove the commutativity lemma.
*}


lemma splitLeaf_huffman_commute:
"[|consistentF ts; ts ≠ []; a ∈ alphabetF ts; freqF ts a = wa + wb|] ==>
splitLeaf (huffman ts) wa a wb b = huffman (splitLeafF ts wa a wb b)"

proof (induct ts rule: huffman.induct)
-- {* {\sc Base case 1:}\enskip $@{term ts} = []$ *}
case 3 thus ?case by simp
next
-- {* {\sc Base case 2:}\enskip $@{term ts} = @{term "[t]"}$ *}
case (1 t) thus ?case by simp
next
-- {* {\sc Induction step:}\enskip $@{term ts} = @{term "t1 # t2 # ts"}$ *}
case (2 t1 t2 ts)
note hyps = 2
show ?case
proof (cases "a ∈ alphabet t1")
case True
moreover hence "a ∉ alphabet t2" "a ∉ alphabetF ts" using hyps by auto
ultimately show ?thesis using hyps by (simp add: uniteTrees_def)
next
case False
note a1 = False
show ?thesis
proof (cases "a ∈ alphabet t2")
case True
moreover hence "a ∉ alphabetF ts" using hyps by auto
ultimately show ?thesis using a1 hyps by (simp add: uniteTrees_def)
next
case False
thus ?thesis using a1 hyps by simp
qed
qed
qed

text {*
An important consequence of the commutativity lemma is that applying Huffman's
algorithm on a forest of the form
$$\vcenter{\hbox{\includegraphics[scale=1.25]{forest-uniteTrees.pdf}}}$$
gives the same result as applying the algorithm on the ``flat'' forest
$$\vcenter{\hbox{\includegraphics[scale=1.25]{forest-uniteTrees-flat.pdf}}}$$
followed by splitting the leaf node $a$ into two nodes $a$, $b$ with
frequencies $@{term wa}$, $@{term wb}$. The lemma effectively
provides a way to flatten the forest at each step of the algorithm. Its
invocation is implicit in the textbook proof.
*}


subsection {* Optimality Theorem *}

text {*
We are one lemma away from our main result.
*}


lemma max_0_imp_0 [simp]:
"(max x y = (0::nat)) = (x = 0 ∧ y = 0)"
by auto

theorem optimum_huffman:
"[|consistentF ts; heightF ts = 0; sortedByWeight ts; ts ≠ []|] ==>
optimum (huffman ts)"


txt {*
The input @{term ts} is assumed to be a nonempty consistent list of leaf nodes
sorted by weight. The proof is by induction on the length of the forest
@{term ts}. Let @{term ts} be
$$\vcenter{\hbox{\includegraphics[scale=1.25]{forest-flat.pdf}}}$$
with $w_a \le w_b \le w_c \le w_d \le \cdots \le w_z$. If @{term ts} consists
of a single leaf node, the node has cost 0 and is therefore optimum. If
@{term ts} has length 2 or more, the first step of the algorithm leaves us with
the term
$${\it huffman\/}\enskip\; \vcenter{\hbox{\includegraphics[scale=1.25]%
{forest-uniteTrees.pdf}}}$$
In the diagram, we put the newly created tree at position 2 in the forest; in
general, it could be anywhere. By @{thm [source] splitLeaf_huffman_commute},
the above tree equals\strut
$${\it splitLeaf\/}\;\left({\it huffman\/}\enskip\;
\vcenter{\hbox{\includegraphics[scale=1.25]{forest-uniteTrees-flat.pdf}}}
\;\right)\,@{text "wa a wb b"}.$$
To prove that this tree is optimum, it suffices by
@{thm [source] optimum_splitLeaf} to show that\strut
$${\it huffman\/}\enskip\;
\vcenter{\hbox{\includegraphics[scale=1.25]{forest-uniteTrees-flat.pdf}}}$$
is optimum, which follows from the induction hypothesis.\strut
*}


proof (induct ts rule: length_induct)
-- {* \sc Complete induction step *}
case (1 ts)
note hyps = 1
show ?case
proof (cases ts)
case Nil thus ?thesis using `ts ≠ []` by fast
next
case (Cons ta ts')
note ts = Cons
show ?thesis
proof (cases ts')
case Nil thus ?thesis using ts hyps by (simp add: optimum_def)
next
case (Cons tb ts'')
note ts' = Cons
show ?thesis
proof (cases ta)
case (Leaf wa a)
note la = Leaf
show ?thesis
proof (cases tb)
case (Leaf wb b)
note lb = Leaf
show ?thesis
proof -
let ?us = "insortTree (uniteTrees ta tb) ts''"
let ?us' = "insortTree (Leaf (wa + wb) a) ts''"
let ?ts = "splitLeaf (huffman ?us') wa a wb b"
have e1: "huffman ts = huffman ?us" using ts' ts by simp
have e2: "huffman ?us = ?ts" using la lb ts' ts hyps
by (auto simp: splitLeaf_huffman_commute uniteTrees_def)

have "optimum (huffman ?us')" using la ts' ts hyps
by (drule_tac x = ?us' in spec)
(auto dest: sortedByWeight_Cons_imp_sortedByWeight
simp: sortedByWeight_insortTree)
hence "optimum ?ts" using la lb ts' ts hyps
apply simp
apply (rule optimum_splitLeaf)
by (auto dest!: heightF_0_imp_Leaf_freqF_in_set
sortedByWeight_Cons_imp_forall_weight_ge)
thus "optimum (huffman ts)" using e1 e2 by simp
qed
next
case InnerNode thus ?thesis using ts' ts hyps by simp
qed
next
case InnerNode thus ?thesis using ts' ts hyps by simp
qed
qed
qed
qed

text {*
\isakeyword{end}

\myskip

\noindent
So what have we achieved? Assuming that our definitions really mean what we
intend them to mean, we established that our functional implementation of
Huffman's algorithm, when invoked properly, constructs a binary tree that
represents an optimal prefix code for the specified alphabet and frequencies.
Using Isabelle's code generator \cite{haftmann-nipkow-2007}, we can convert the
Isabelle code into Standard ML, OCaml, or Haskell and use it in a real
application.

As a side note, the @{thm [source] optimum_huffman} theorem assumes that the
forest @{term ts} passed to @{const huffman} consists exclusively of leaf nodes.
It is tempting to relax this restriction, by requiring instead that the forest
@{term ts} has the lowest cost among forests of the same size. We would define
optimality of a forest as follows:
$$\begin{aligned}[t]
@{prop "optimumF ts"}\,\;@{text "≡"}\;\,
(@{text "∀us."}\
& @{text "length ts = length us --> consistentF us -->"} \\[-2.5pt]
& @{text "alphabetF ts = alphabetF us --> freqF ts = freqF us -->"}
\\[-2.5pt]
& @{prop "costF ts ≤ costF us"})\end{aligned}$$
with $@{text "costF [] = 0"}$ and
$@{prop "costF (t # ts) = cost t + costF ts"}$. However, the modified
proposition does not hold. A counterexample is the optimum forest
$$\includegraphics{forest-optimal.pdf}$$
for which the algorithm constructs the tree
$$\vcenter{\hbox{\includegraphics{tree-suboptimal.pdf}}}
\qquad \hbox{of greater cost than} \qquad
\vcenter{\hbox{\includegraphics{tree-optimal.pdf}}}$$
*}


section {* Related Work
\label{related-work} *}


text {*
Laurent Th\'ery's Coq formalization of Huffman's algorithm \cite{thery-2003,%
thery-2004} is an obvious yardstick for our work. It has a somewhat wider
scope, proving among others the isomorphism between prefix codes and full binary
trees. With 291 theorems, it is also much larger.

Th\'ery identified the following difficulties in formalizing the textbook
proof:

\begin{enumerate}
\item The leaf interchange process that brings the two minimal symbols together
is tedious to formalize.

\item The sibling merging process requires introducing a new symbol for the
merged node, which complicates the formalization.

\item The algorithm constructs the tree in a bottom-up fashion. While top-down
procedures can usually be proved by structural induction, bottom-up
procedures often require more sophisticated induction principles and
larger invariants.

\item The informal proof relies on the notion of depth of a node. Defining this
notion formally is problematic, because the depth can only be seen as a
function if the tree is composed of distinct nodes.
\end{enumerate}

To circumvent these difficulties, Th\'ery introduced the ingenious concept of
cover. A forest @{term ts} is a {\em cover\/} of a tree~$t$ if $t$ can be built
from @{term ts} by adding inner nodes on top of the trees in @{term ts}. The
term ``cover'' is easier to understand if the binary trees are drawn with the
root at the bottom of the page, like natural trees. Huffman's algorithm is
a refinement of the cover concept. The main proof consists in showing that
the cost of @{term "huffman ts"} is less than or equal to that of any other
tree for which @{term ts} is a cover. It relies on a few auxiliary definitions,
notably an ``ordered cover'' concept that facilitates structural induction
and a four-argument depth predicate (confusingly called @{term height}).
Permutations also play a central role.

Incidentally, our experience suggests that the potential problems identified
by Th\'ery can be overcome more directly without too much work, leading to a
simpler proof:

\begin{enumerate}
\item Formalizing the leaf interchange did not prove overly tedious. Among our
95~lemmas and theorems, 24 concern @{const swapLeaves},
@{const swapSyms}, and @{const swapFourSyms}.

\item The generation of a new symbol for the resulting node when merging two
sibling nodes in @{const mergeSibling} was trivially solved by reusing
one of the two merged symbols.

\item The bottom-up nature of the tree construction process was addressed by
using the length of the forest as the induction measure and by merging
the two minimal symbols, as in Knuth's proof.

\item By restricting our attention to consistent trees, we were able to define
the @{const depth} function simply and meaningfully.
\end{enumerate}
*}


section {* Conclusion
\label{conclusion} *}


text {*
The goal of most formal proofs is to increase our confidence in a result. In
the case of Huffman's algorithm, however, the chances that a bug would have
gone unnoticed for the 56 years since its publication, under the scrutiny of
leading computer scientists, seem extremely low; and the existence of a Coq
proof should be sufficient to remove any remaining doubts.

The main contribution of this report has been to demonstrate that the textbook
proof of Huffman's algorithm can be elegantly formalized using
a state-of-the-art theorem prover such as Isabelle/HOL. In the process, we
uncovered a few minor snags in the proof given in Cormen et
al.~\cite{cormen-et-al-2001}.

We also found that custom induction rules, in combination with suitable
simplification rules, greatly help the automatic proof tactics, sometimes
reducing 30-line proof scripts to one-liners. We successfully applied this
approach for handling both the ubiquitous ``datatype + well\-formed\-ness
predicate'' combination (@{typ "'a tree"} + @{const consistent}) and functions
defined by sequential pattern matching (@{const sibling} and
@{const mergeSibling}). Our experience suggests that such rules, which are
uncommon in formalizations, are highly valuable and versatile. Moreover,
Isabelle's \textit{induction\_schema} and \textit{lexicographic\_order} tactics
make these easy to prove.

Formalizing the proof of Huffman's algorithm also led to a deeper
understanding of this classic algorithm. Many of the lemmas, notably the leaf
split commutativity lemma of Section~\ref{leaf-split-commutativity}, have not
been found in the literature and express fundamental properties of the
algorithm. Other discoveries didn't find their way into the final proof. In
particular, each step of the algorithm appears to preserve the invariant that
the nodes in a forest are ordered by weight from left to right, bottom to top,
as in the example below:\strut
$$\vcenter{\hbox{\includegraphics[scale=1.25]{forest-zigzag.pdf}}}$$
It is not hard to prove formally that a tree exhibiting this property is
optimum. On the other hand, proving that the algorithm preserves this invariant
seems difficult---more difficult than formalizing the textbook proof---and
remains a suggestion for future work.

A few other directions for future work suggest themselves. First, we could
formalize some of our hypotheses, notably our restriction to full and
consistent binary trees. The current formalization says nothing about the
algorithm's application for data compression, so the next step could be to
extend the proof's scope to cover @{term encode}/@{term decode} functions
and show that full binary trees are isomorphic to prefix codes, as done in the
Coq development. Independently, we could generalize the development to $n$-ary
trees.
*}


(*<*)
end
(*>*)