# Theory Huffman

Up to index of Isabelle/HOL/Huffman

theory Huffman
imports Main
(*  Title:       An Isabelle/HOL Formalization of the Textbook Proof of Huffman's Algorithm    Author:      Jasmin Christian Blanchette <blanchette at in.tum.de>, 2008    Maintainer:  Jasmin Christian Blanchette <blanchette at in.tum.de>*)(*<*)theory Huffmanimports Mainbegin(*>*)section {* Introduction *}subsection {* Binary Codes              \label{binary-codes} *}text {*Suppose we want to encode strings over a finite source alphabet to sequencesof bits. The approach used by ASCII and most other charsets is to map eachsource symbol to a distinct $k$-bit code word, where $k$ is fixed and istypically 8 or 16. To encode a string of symbols, we simply encode each symbolin turn. Decoding involves mapping each $k$-bit block back to the symbol itrepresents.Fixed-length codes are simple and fast, but they generally waste space. Ifwe know the frequency $w_a$ of each source symbol $a$, we can save spaceby using shorter code words for the most frequent symbols. Wesay that a (variable-length) code is {\sl optimum\/} if it minimizes the sum$\sum_a w_a \vthinspace\delta_a$, where $\delta_a$ is the length of the binarycode word for $a$. Information theory tells us that a code is optimum iffor each source symbol $c$ the code word representing $c$ has length$$\textstyle \delta_c = \log_2 {1 \over p_c}, \qquad \hbox{where}\enskip p_c = {w_c \over \sum_a w_a}.$$This number is generally not an integer, so we cannot use it directly.Nonetheless, the above criterion is a useful yardstick and paves the way forarithmetic coding \cite{rissanen-1976}, a generalization of the methodpresented here.\def\xabacabad{\xa\xb\xa\xc\xa\xb\xa\xd}%As an example, consider the source string $\xabacabad$'. We have$$p_{\xa} = \tfrac{1}{2},\,\; p_{\xb} = \tfrac{1}{4},\,\; p_{\xc} = \tfrac{1}{8},\,\; p_{\xd} = \tfrac{1}{8}.$$The optimum lengths for the binary code words are all integers, namely$$\delta_{\xa} = 1,\,\; \delta_{\xb} = 2,\,\; \delta_{\xc} = 3,\,\; \delta_{\xd} = 3,$$and they are realized by the code$$C_1 = \{ \xa \mapsto 0,\, \xb \mapsto 10,\, \xc \mapsto 110,\, \xd \mapsto 111 \}.$$Encoding $\xabacabad$' produces the 14-bit code word 01001100100111. The code$C_1$ is optimum: No code that unambiguously encodes source symbols one at atime could do better than $C_1$ on the input $\xa\xb\xa\xc\xa\xb\xa\xd$'. Inparticular, with a fixed-length code such as$$C_2 = \{ \xa \mapsto 00,\, \xb \mapsto 01,\, \xc \mapsto 10,\, \xd \mapsto 11 \}$$we need at least 16~bits to encode $\xabacabad$'.*}subsection {* Binary Trees *}text {*Inside a program, binary codes can be represented by binary trees. For example,the trees\strut$$\vcenter{\hbox{\includegraphics[scale=1.25]{tree-abcd-unbalanced.pdf}}} \qquad \hbox{and} \qquad \vcenter{\hbox{\includegraphics[scale=1.25]{tree-abcd-balanced.pdf}}}$$correspond to $C_1$ and $C_2$. The code word for a givensymbol can be obtained as follows: Start at the root and descend toward the leafnode associated with the symbol one node at a time; generate a 0 whenever theleft child of the current node is chosen and a 1 whenever the right child ischosen. The generated sequence of 0s and 1s is the code word.To avoid ambiguities, we require that only leaf nodes are labeled with symbols.This ensures that no code word is a prefix of another, thereby eliminating thesource of all ambiguities.%\footnote{Strictly speaking, there is another potential source of ambiguity.If the alphabet consists of a single symbol $a$, that symbol could be mappedto the empty code word, and then any string $aa\ldots a$ would map to theempty bit sequence, giving the decoder no way to recover the original string'slength. This scenario can be ruled out by requiring that the alphabet hascardinality 2 or more.}Codes that have this property are called {\sl prefix codes}. As an example of acode that doesn't have this property, consider the code associated with thetree\strut$$\vcenter{\hbox{\includegraphics[scale=1.25]{tree-abcd-non-prefix.pdf}}}$$and observe that $\xb\xb\xb$', $\xb\xd$', and $\xd\xb$' all map to thecode word 111.Each node in a code tree is assigned a {\sl weight}. For a leaf node, theweight is the frequency of its symbol; for an inner node, it is the sum of theweights of its subtrees. Code trees can be annotated with their weights:\strut$$\vcenter{\hbox{\includegraphics[scale=1.25]% {tree-abcd-unbalanced-weighted.pdf}}} \qquad\qquad \vcenter{\hbox{\includegraphics[scale=1.25]% {tree-abcd-balanced-weighted.pdf}}}$$For our purposes, it is sufficient to consider only full binary trees (treeswhose inner nodes all have two children). This is because any inner node withonly one child can advantageously be eliminated; for example,\strut$$\vcenter{\hbox{\includegraphics[scale=1.25]{tree-abc-non-full.pdf}}} \qquad \hbox{becomes} \qquad \vcenter{\hbox{\includegraphics[scale=1.25]{tree-abc-full.pdf}}}$$*}subsection {* Huffman's Algorithm *}text {*David Huffman \cite{huffman-1952} discovered a simple algorithm forconstructing an optimum code tree for specified symbol frequencies:Create a forest consisting of only leaf nodes, one for each symbol in thealphabet, taking the given symbol frequencies as initial weights for the nodes.Then pick the two trees$$\vcenter{\hbox{\includegraphics[scale=1.25]{tree-w1.pdf}}} \qquad \hbox{and} \qquad \vcenter{\hbox{\includegraphics[scale=1.25]{tree-w2.pdf}}}$$\noindent\strutwith the lowest weights and replace them with the tree$$\vcenter{\hbox{\includegraphics[scale=1.25]{tree-w1-w2.pdf}}}$$Repeat this process until only one tree is left.As an illustration, executing the algorithm for the frequencies$$f_{\xd} = 3,\,\; f_{\xe} = 11,\,\; f_{\xf} = 5,\,\; f_{\xs} = 7,\,\; f_{\xz} = 2$$gives rise to the following sequence of states:\strut\def\myscale{1}%\setbox\myboxi=\hbox{(9)\strut}%\setbox\myboxii=\hbox{\includegraphics[scale=\myscale]{tree-prime-step1.pdf}}%\setbox\myboxiii=\hbox{\includegraphics[scale=\myscale]{tree-prime-step2.pdf}}%$$(1)\quad\lower\ht\myboxii\hbox{\raise\ht\myboxi\box\myboxii} \qquad\qquad (2)\enskip\lower\ht\myboxiii\hbox{\raise\ht\myboxi\box\myboxiii}$$\vskip.5\smallskipamount\noindent\setbox\myboxii=\hbox{\includegraphics[scale=\myscale]{tree-prime-step3.pdf}}%\setbox\myboxiii=\hbox{\includegraphics[scale=\myscale]{tree-prime-step4.pdf}}%\setbox\myboxiv=\hbox{\includegraphics[scale=\myscale]{tree-prime-step5.pdf}}%(3)\quad\lower\ht\myboxii\hbox{\raise\ht\myboxi\box\myboxii}\hfill\quad  (4)\quad\lower\ht\myboxiii\hbox{\raise\ht\myboxi\box\myboxiii}\hfill  (5)\enskip\lower\ht\myboxiv\hbox{\raise\ht\myboxi\box\myboxiv\,}\smallskip\noindentTree~(5) is an optimum tree for the given frequencies.*}subsection {* The Textbook Proof *}text {*Why does the algorithm work? In his article, Huffman gave some motivation butno real proof. For a proof sketch, we turn to Donald Knuth\cite[p.~403--404]{knuth-1997}:\begin{quote}It is not hard to prove that this method does in fact minimize the weightedpath length [i.e., $\sum_a w_a \vthinspace\delta_a$], by induction on $m$.Suppose we have $w_1 \le w_2 \le w_3 \le \cdots \le w_m$, where $m \ge 2$, andsuppose that we are given a tree that minimizes the weighted path length.(Such a tree certainly exists, since only finitely many binary trees with $m$terminal nodes are possible.) Let $V$ be an internal node of maximum distancefrom the root. If $w_1$ and $w_2$ are not the weights already attached to thechildren of $V$, we can interchange them with the values that are alreadythere; such an interchange does not increase the weighted path length. Thusthere is a tree that minimizes the weighted path length and contains thesubtree\strut$$\vcenter{\hbox{\includegraphics[scale=1.25]{tree-w1-w2-leaves.pdf}}}$$Now it is easy to prove that the weighted path length of such a tree isminimized if and only if the tree with$$\vcenter{\hbox{\includegraphics[scale=1.25]{tree-w1-w2-leaves.pdf}}} \qquad \hbox{replaced by} \qquad \vcenter{\hbox{\includegraphics[scale=1.25]{tree-w1-plus-w2.pdf}}}$$has minimum path length for the weights $w_1 + w_2$, $w_3$, $\ldots\,$, $w_m$.\end{quote}\noindentThere is, however, a small oddity in this proof: It is not clear why we mustassert the existence of an optimum tree that contains the subtree$$\vcenter{\hbox{\includegraphics[scale=1.25]{tree-w1-w2-leaves.pdf}}}$$Indeed, the formalization works without it.Cormen et al.\ \cite[p.~385--391]{cormen-et-al-2001} provide a very similarproof, articulated around the following propositions:\begin{quote}\textsl{\textbf{Lemma 16.2}} \\Let $C$ be an alphabet in which each character $c \in C$ has frequency $f[c]$.Let $x$ and $y$ be two characters in $C$ having the lowest frequencies. Thenthere exists an optimal prefix code for $C$ in which the codewords for $x$ and$y$ have the same length and differ only in the last bit.\medskip\textsl{\textbf{Lemma 16.3}} \\Let $C$ be a given alphabet with frequency $f[c]$ defined for each character$c \in C$. Let $x$ and $y$ be two characters in $C$ with minimum frequency. Let$C'$ be the alphabet $C$ with characters $x$, $y$ removed and (new) character$z$ added, so that $C' = C - \{x, y\} \cup {\{z\}}$; define $f$ for $C'$ as for$C$, except that $f[z] = f[x] + f[y]$. Let $T'$ be any tree representing anoptimal prefix code for the alphabet $C'$. Then the tree $T$, obtained from$T'$ by replacing the leaf node for $z$ with an internal node having $x$ and$y$ as children, represents an optimal prefix code for the alphabet $C$.\medskip\textsl{\textbf{Theorem 16.4}} \\Procedure \textsc{Huffman} produces an optimal prefix code.\end{quote}*}subsection {* Overview of the Formalization *}text {*This report presents a formalization of the proof of Huffman's algorithmwritten using Isabelle/HOL \cite{nipkow-et-al-2008}. Our proof is based on theinformal proofs given by Knuth and Cormen et al. The development was doneindependently of Laurent Th\'ery's Coq proof \cite{thery-2003,thery-2004},which through its cover'' concept represents a considerable departure fromthe textbook proof.The development consists of 90 lemmas and 5 theorems. Most of them have veryshort proofs thanks to the extensive use of simplification rules and custominduction rules. The remaining proofs are written using the structured proofformat Isar \cite{wenzel-2008} and are accompanied by informal arguments anddiagrams.The report is organized as follows. Section~\ref{trees-and-forests} definesthe datatypes for binary code trees and forests and develops a small library ofrelated functions. (Incidentally, there is nothing special about binary codesand binary trees. Huffman's algorithm and its proof can be generalized to$n$-ary trees \cite[p.~405 and 595]{knuth-1997}.) Section~\ref{implementation}presents a functional implementation of the algorithm. Section~\ref{auxiliary}defines several tree manipulation functions needed for the proof.Section~\ref{formalization} presents three key lemmas and concludes with theoptimality theorem. Section~\ref{related-work} compares our work with Th\'ery'sCoq proof. Finally, Section~\ref{conclusion} concludes the report.*}subsection {* Overview of Isabelle's HOL Logic *}text {*This section presents a brief overview of the Isabelle/HOL logic, so thatreaders not familiar with the system can at least understand the lemmas andtheorems, if not the proofs. Readers who already know Isabelle are encouragedto skip this section.Isabelle is a generic theorem prover whose built-in metalogic is anintuitionistic fragment of higher-order logic\cite{gordon-melham-1994,nipkow-et-al-2008}. The metalogical operators arematerial implication, written\vthinspace@{text "[|φ⇣1; …; φ⇣n|] ==> ψ"}\vthinspace{} (if @{term φ⇣1} and$\ldots$ and @{term φ⇣n}, then @{term ψ}''), universal quantification,written \vthinspace@{text "!!x⇣1 … x⇣n. ψ"}\vthinspace{} (for all $@{term x⇣1},\ldots, @{term x⇣n}$ we have @{term ψ}''), and equality, written\vthinspace@{text "t ≡ u"}.The incarnation of Isabelle that we use in this development, Isabelle/HOL,provides a more elaborate version of higher-order logic, complete with thefamiliar connectives and quantifiers (@{text "¬"}, @{text "∧"}, @{text "∨"},@{text "-->"}, @{text "∀"}, and @{text "∃"}) on terms of type @{typ bool}. Inaddition, $=$ expresses equivalence. The formulas\vthinspace@{text "!!x⇣1 … x⇣m. [|φ⇣1; …; φ⇣n|] ==> ψ"}\vthinspace{} and\vthinspace@{text "∀x⇣1. … ∀x⇣m. φ⇣1 ∧"}$\,\cdots\,$@{text "∧ φ⇣n --> ψ"}%\vthinspace{} are logically equivalent, but they interact differently withIsabelle's proof tactics.The term language consists of simply typed $\lambda$-terms written in anML-like syntax \cite{milner-et-al-1997}. Function application expects noparentheses around the argument list and no commas between the arguments, as in@{term "f x y"}. Syntactic sugar provides an infix syntax for common operators,such as $x = y$ and $x + y$. Types are inferred automatically in most cases, butthey can always be supplied using an annotation @{text "t::τ"}, where @{term t}is a term and @{text τ} is its type. The type of total functions from@{typ 'a} to @{typ 'b} is written @{typ "'a => 'b"}. Variables may rangeover functions.The type of natural numbers is called @{typ nat}. The type of lists over type@{text 'a}, written @{typ "'a list"}, features the empty list @{term "[]"},the infix constructor @{term "x # xs"} (where @{term x} is an element of type@{text 'a} and @{term xs} is a list over @{text 'a}), and the conversionfunction @{term set} from lists to sets. The type of sets over @{text 'a} iswritten @{typ "'a set"}. Operations on sets are written using traditionalmathematical notation.*}subsection {* Head of the Theory File *}text {*The Isabelle theory starts in the standard way.\myskip\noindent\isacommand{theory} @{text "Huffman"} \\\isacommand{imports} @{text "Main"} \\\isacommand{begin}\myskip\noindentWe attach the @{text "simp"} attribute to some predefined lemmas to add them tothe default set of simplification rules.*}declare Int_Un_distrib [simp]        Int_Un_distrib2 [simp]        min_max.sup_absorb1 [simp]        min_max.sup_absorb2 [simp]section {* Definition of Prefix Code Trees and Forests           \label{trees-and-forests} *}subsection {* Tree Datatype *}text {*A {\sl prefix code tree\/} is a full binary tree in which leaf nodes are of theform @{term "Leaf w a"}, where @{term a} is a symbol and @{term w} is thefrequency associated with @{term a}, and inner nodes are of the form@{term "InnerNode w t⇣1 t⇣2"}, where @{term t⇣1} and @{term t⇣2} are the left andright subtrees and @{term w} caches the sum of the weights of @{term t⇣1} and@{term t⇣2}. Prefix code trees are polymorphic on the symbol datatype~@{typ 'a}.*}datatype 'a tree =Leaf nat 'a |InnerNode nat "('a tree)" "('a tree)"subsection {* Forest Datatype *}text {*The intermediate steps of Huffman's algorithm involve a list of prefix codetrees, or {\sl prefix code forest}.*}type_synonym 'a forest = "'a tree list"subsection {* Alphabet *}text {*The {\sl alphabet\/} of a code tree is the set of symbols appearing in thetree's leaf nodes.*}primrec alphabet :: "'a tree => 'a set" where"alphabet (Leaf w a) = {a}" |"alphabet (InnerNode w t⇣1 t⇣2) = alphabet t⇣1 ∪ alphabet t⇣2"text {*For sets and predicates, Isabelle gives us the choice between inductivedefinitions (\isakeyword{inductive\_set} and \isakeyword{inductive}) andrecursive functions (\isakeyword{primrec}, \isakeyword{fun}, and\isakeyword{function}). In this development, we consistently favor recursionover induction, for two reasons:\begin{myitemize}\item Recursion gives rise to simplification rules that greatly help automaticproof tactics. In contrast, reasoning about inductively defined sets andpredicates involves introduction and elimination rules, which are more clumsythan simplification rules.\item Isabelle's counterexample generator \isakeyword{quickcheck}\cite{berghofer-nipkow-2004}, which we used extensively during the top-downdevelopment of the proof (together with \isakeyword{sorry}), has better supportfor recursive definitions.\end{myitemize}The alphabet of a forest is defined as the union of the alphabets of the treesthat compose it. Although Isabelle supports overloading for non-overlappingtypes, we avoid many type inference problems by attaching an\raise.3ex\hbox{@{text "⇣F"}}' subscript to the forest generalizations offunctions defined on trees.*}primrec alphabet⇣F :: "'a forest => 'a set" where"alphabet⇣F [] = {}" |"alphabet⇣F (t # ts) = alphabet t ∪ alphabet⇣F ts"text {*Alphabets are central to our proofs, and we need the following basic factsabout them.*}lemma finite_alphabet [simp]:"finite (alphabet t)"by (induct t) autolemma exists_in_alphabet:"∃a. a ∈ alphabet t"by (induct t) autosubsection {* Consistency *}text {*A tree is {\sl consistent\/} if for each inner node the alphabets of the twosubtrees are disjoint. Intuitively, this means that every symbol in thealphabet occurs in exactly one leaf node. Consistency is a sufficient conditionfor $\delta_a$ (the length of the {\sl unique\/} code word for $a$) to bedefined. Although this well\-formed\-ness property isn't mentioned in algorithmstextbooks \cite{aho-et-al-1983,cormen-et-al-2001,knuth-1997}, it is essentialand appears as an assumption in many of our lemmas.*}primrec consistent :: "'a tree => bool" where"consistent (Leaf w a) = True" |"consistent (InnerNode w t⇣1 t⇣2) =     (consistent t⇣1 ∧ consistent t⇣2 ∧ alphabet t⇣1 ∩ alphabet t⇣2 = {})"primrec consistent⇣F :: "'a forest => bool" where"consistent⇣F [] = True" |"consistent⇣F (t # ts) =     (consistent t ∧ consistent⇣F ts ∧ alphabet t ∩ alphabet⇣F ts = {})"text {*Several of our proofs are by structural induction on consistent trees $t$ andinvolve one symbol $a$. These proofs typically distinguish the following cases.\begin{myitemize}\item[] {\sc Base case:}\enspace $t = @{term "Leaf w b"}$.\item[] {\sc Induction step:}\enspace $t = @{term "InnerNode w t⇣1 t⇣2"}$.\item[] \noindent\kern\leftmargin {\sc Subcase 1:}\enspace $a$ belongs to        @{term t⇣1} but not to @{term t⇣2}.\item[] \noindent\kern\leftmargin {\sc Subcase 2:}\enspace $a$ belongs to        @{term t⇣2} but not to @{term t⇣1}.\item[] \noindent\kern\leftmargin {\sc Subcase 3:}\enspace $a$ belongs to        neither @{term t⇣1} nor @{term t⇣2}.\end{myitemize}\noindentThanks to the consistency assumption, we can rule out the subcase where $a$belongs to both subtrees.Instead of performing the above case distinction manually, we encode it in acustom induction rule. This saves us from writing repetitive proof scripts andhelps Isabelle's automatic proof tactics.*}lemma tree_induct_consistent [consumes 1, case_names base step⇣1 step⇣2 step⇣3]:"[|consistent t;  !!w⇣b b a. P (Leaf w⇣b b) a;  !!w t⇣1 t⇣2 a.     [|consistent t⇣1; consistent t⇣2; alphabet t⇣1 ∩ alphabet t⇣2 = {};      a ∈ alphabet t⇣1; a ∉ alphabet t⇣2; P t⇣1 a; P t⇣2 a|] ==>     P (InnerNode w t⇣1 t⇣2) a;  !!w t⇣1 t⇣2 a.     [|consistent t⇣1; consistent t⇣2; alphabet t⇣1 ∩ alphabet t⇣2 = {};      a ∉ alphabet t⇣1; a ∈ alphabet t⇣2; P t⇣1 a; P t⇣2 a|] ==>     P (InnerNode w t⇣1 t⇣2) a;  !!w t⇣1 t⇣2 a.     [|consistent t⇣1; consistent t⇣2; alphabet t⇣1 ∩ alphabet t⇣2 = {};      a ∉ alphabet t⇣1; a ∉ alphabet t⇣2; P t⇣1 a; P t⇣2 a|] ==>     P (InnerNode w t⇣1 t⇣2) a|] ==> P t a"txt {*The proof relies on the \textit{induction\_schema} and\textit{lexicographic\_order} tactics, which automate the most tediousaspects of deriving induction rules. The alternative would have been to performa standard structural induction on @{term t} and proceed by cases, which isstraightforward but long-winded.*}apply rotate_tacapply induction_schema       apply atomize_elim       apply (case_tac t)        apply fastforce       apply fastforceby lexicographic_ordertext {*The \textit{induction\_schema} tactic reduces the putative induction rule tosimpler proof obligations.Internally, it reuses the machinery that constructs the default inductionrules. The resulting proof obligations concern (a)~case completeness,(b)~invariant preservation (in our case, tree consistency), and(c)~wellfoundedness. For @{thm [source] tree_induct_consistent}, the obligations(a)~and (b) can be discharged usingIsabelle's simplifier and classical reasoner, whereas (c) requires a singleinvocation of \textit{lexicographic\_order}, a tactic that was originallydesigned to prove termination of recursive functions\cite{bulwahn-et-al-2007,krauss-2007,krauss-2009}.*}subsection {* Symbol Depths *}text {*The {\sl depth\/} of a symbol (which we denoted by $\delta_a$ inSection~\ref{binary-codes}) is the length of the path from the root to theleaf node labeled with that symbol, or equivalently the length of the code wordfor the symbol. Symbols that don't occur in the tree or that occur at the rootof a one-node tree have depth 0. If a symbol occurs in several leaf nodes (whichmay happen with inconsistent trees), the depth is arbitrarily defined in termsof the leftmost node labeled with that symbol.*}primrec depth :: "'a tree => 'a => nat" where"depth (Leaf w b) a = 0" |"depth (InnerNode w t⇣1 t⇣2) a =     (if a ∈ alphabet t⇣1 then depth t⇣1 a + 1      else if a ∈ alphabet t⇣2 then depth t⇣2 a + 1      else 0)"text {*The definition may seem very inefficient from a functional programmingpoint of view, but it does not matter, because unlike Huffman's algorithm, the@{const depth} function is merely a reasoning tool and is never actuallyexecuted.*}subsection {* Height *}text {*The {\sl height\/} of a tree is the length of the longest path from the root toa leaf node, or equivalently the length of the longest code word. This isreadily generalized to forests by taking the maximum of the trees' heights. Notethat a tree has height 0 if and only if it is a leaf node, and that a forest hasheight 0 if and only if all its trees are leaf nodes.*}primrec height :: "'a tree => nat" where"height (Leaf w a) = 0" |"height (InnerNode w t⇣1 t⇣2) = max (height t⇣1) (height t⇣2) + 1"primrec height⇣F :: "'a forest => nat" where"height⇣F [] = 0" |"height⇣F (t # ts) = max (height t) (height⇣F ts)"text {*The depth of any symbol in the tree is bounded by the tree's height, and thereexists a symbol with a depth equal to the height.*}lemma depth_le_height:"depth t a ≤ height t"by (induct t) autolemma exists_at_height:"consistent t ==> ∃a ∈ alphabet t. depth t a = height t"proof (induct t)  case Leaf thus ?case by simpnext  case (InnerNode w t⇣1 t⇣2)  note hyps = InnerNode  let ?t = "InnerNode w t⇣1 t⇣2"  from hyps obtain b where b: "b ∈ alphabet t⇣1" "depth t⇣1 b = height t⇣1" by auto  from hyps obtain c where c: "c ∈ alphabet t⇣2" "depth t⇣2 c = height t⇣2" by auto  let ?a = "if height t⇣1 ≥ height t⇣2 then b else c"  from b c have "?a ∈ alphabet ?t" "depth ?t ?a = height ?t"    using consistent ?t by auto  thus "∃a ∈ alphabet ?t. depth ?t a = height ?t" ..qedtext {*The following elimination rules help Isabelle's classical prover, notably the\textit{auto} tactic. They are easy consequences of the inequation@{thm depth_le_height [no_vars]}.*}lemma depth_max_heightE_left [elim!]:"[|depth t⇣1 a = max (height t⇣1) (height t⇣2);  [|depth t⇣1 a = height t⇣1; height t⇣1 ≥ height t⇣2|] ==> P|] ==> P"by (cut_tac t = t⇣1 and a = a in depth_le_height) simplemma depth_max_heightE_right [elim!]:"[|depth t⇣2 a = max (height t⇣1) (height t⇣2);  [|depth t⇣2 a = height t⇣2; height t⇣2 ≥ height t⇣1|] ==> P|] ==> P"by (cut_tac t = t⇣2 and a = a in depth_le_height) simptext {*We also need the following lemma.*}lemma height_gt_0_alphabet_eq_imp_height_gt_0:assumes "height t > 0" "consistent t" "alphabet t = alphabet u"shows "height u > 0"proof (cases t)  case Leaf thus ?thesis using assms by simpnext  case (InnerNode w t⇣1 t⇣2)  note t = InnerNode  from exists_in_alphabet obtain b where b: "b ∈ alphabet t⇣1" ..  from exists_in_alphabet obtain c where c: "c ∈ alphabet t⇣2" ..  from b c have bc: "b ≠ c" using t consistent t by fastforce  show ?thesis  proof (cases u)    case Leaf thus ?thesis using b c bc t assms by auto  next    case InnerNode thus ?thesis by simp  qedqedsubsection {* Symbol Frequencies *}text {*The {\sl frequency\/} of a symbol (which we denoted by $w_a$ inSection~\ref{binary-codes}) is the sum of the weights attached to theleaf nodes labeled with that symbol. If the tree is consistent, the sumcomprises at most one nonzero term. The frequency is then the weight of the leafnode labeled with the symbol, or 0 if there is no such node. The generalizationto forests is straightforward.*}primrec freq :: "'a tree => 'a => nat" where"freq (Leaf w a) = (λb. if b = a then w else 0)" |"freq (InnerNode w t⇣1 t⇣2) = (λb. freq t⇣1 b + freq t⇣2 b)"primrec freq⇣F :: "'a forest => 'a => nat" where"freq⇣F [] = (λb. 0)" |"freq⇣F (t # ts) = (λb. freq t b + freq⇣F ts b)"text {*Alphabet and symbol frequencies are intimately related. Simplification rulesensure that sums of the form @{term "freq t⇣1 a + freq t⇣2 a"} collapse to asingle term when we know which tree @{term a} belongs to.*}lemma notin_alphabet_imp_freq_0 [simp]:"a ∉ alphabet t ==> freq t a = 0"by (induct t) simp+lemma notin_alphabet⇣F_imp_freq⇣F_0 [simp]:"a ∉ alphabet⇣F ts ==> freq⇣F ts a = 0"by (induct ts) simp+lemma freq_0_right [simp]:"[|alphabet t⇣1 ∩ alphabet t⇣2 = {}; a ∈ alphabet t⇣1|] ==> freq t⇣2 a = 0"by (auto intro: notin_alphabet_imp_freq_0 simp: disjoint_iff_not_equal)lemma freq_0_left [simp]:"[|alphabet t⇣1 ∩ alphabet t⇣2 = {}; a ∈ alphabet t⇣2|] ==> freq t⇣1 a = 0"by (auto simp: disjoint_iff_not_equal)text {*Two trees are {\em comparable} if they have the same alphabet and symbolfrequencies. This is an important concept, because it allows us to state notonly that the tree constructed by Huffman's algorithm is optimal but also thatit has the expected alphabet and frequencies.We close this section with a more technical lemma.*}lemma height⇣F_0_imp_Leaf_freq⇣F_in_set:"[|consistent⇣F ts; height⇣F ts = 0; a ∈ alphabet⇣F ts|] ==> Leaf (freq⇣F ts a) a ∈ set ts"proof (induct ts)  case Nil thus ?case by simpnext  case (Cons t ts) show ?case using Cons  proof (cases t)    case Leaf thus ?thesis using Cons by clarsimp  next    case InnerNode thus ?thesis using Cons by clarsimp  qedqedsubsection {* Weight *}text {*The @{term weight} function returns the weight of a tree. In the@{const InnerNode} case, we ignore the weight cached in the node and insteadcompute the tree's weight recursively. This makes reasoning simpler because wecan then avoid specifying cache correctness as an assumption in our lemmas.*}primrec weight :: "'a tree => nat" where"weight (Leaf w a) = w" |"weight (InnerNode w t⇣1 t⇣2) = weight t⇣1 + weight t⇣2"text {*The weight of a tree is the sum of the frequencies of its symbols.\myskip\noindent\isacommand{lemma} @{text "weight_eq_Sum_freq"}: \\{\isachardoublequoteopen}$\displaystyle @{text "consistent t ==> weight t"} =\!\!\sum_{a\in @{term "alphabet t"}}^{\phantom{.}}\!\! @{term "freq t a"}$%{\isachardoublequoteclose}\vskip-\myskipamount*}(*<*)lemma weight_eq_Sum_freq:"consistent t ==> weight t = (∑a ∈ alphabet t. freq t a)"(*>*)by (induct t) (auto simp: setsum_Un_disjoint)text {*The assumption @{term "consistent t"} is not necessary, but it simplifies theproof by letting us invoke the lemma @{thm [source] setsum_Un_disjoint}:$$@{text "[|finite A; finite B; A ∩ B = {}|] ==>"}~\!\sum_{x\in A} @{term "g x"}\vthinspace \mathrel{+} \sum_{x\in B} @{term "g x"}\vthinspace = % \!\!\sum_{x\in A \cup B}\! @{term "g x"}.$$*}subsection {* Cost *}text {*The {\sl cost\/} of a consistent tree, sometimes called the {\sl weighted pathlength}, is given by the sum $\sum_{a \in @{term "alphabet t"}\,}@{term "freq t a"} \mathbin{@{text "*"}} @{term "depth t a"}$(which we denoted by $\sum_a w_a \vthinspace\delta_a$ inSection~\ref{binary-codes}). It obeys a simple recursive law.*}primrec cost :: "'a tree => nat" where"cost (Leaf w a) = 0" |"cost (InnerNode w t⇣1 t⇣2) = weight t⇣1 + cost t⇣1 + weight t⇣2 + cost t⇣2"text {*One interpretation of this recursive law is that the cost of a tree is the sumof the weights of its inner nodes \cite[p.~405]{knuth-1997}. (Recall that$@{term "weight (InnerNode w t⇣1 t⇣2)"} = @{term "weight t⇣1 + weight t⇣2"}$.) Sincethe cost of a tree is such a fundamental concept, it seems necessary to provethat the above function definition is correct.\myskip\noindent\isacommand{theorem} @{text "cost_eq_Sum_freq_mult_depth"}: \\{\isachardoublequoteopen}$\displaystyle @{text "consistent t ==> cost t"} =\!\!\sum_{a\in @{term "alphabet t"}}^{\phantom{.}}\!\!@{term "freq t a * depth t a"}$%{\isachardoublequoteclose}\myskip\noindentThe proof is by structural induction on $t$. If $t = @{term "Leaf w b"}$, bothsides of the equation simplify to 0. This leaves the case $@{term t} =@{term "InnerNode w t⇣1 t⇣2"}$. Let $A$, $A_1$, and $A_2$ stand for@{term "alphabet t"}, @{term "alphabet t⇣1"}, and @{term "alphabet t⇣2"},respectively. We have%$$\begin{tabularx}{\textwidth}{@%{\hskip\leftmargin}cX@%{}} & @{term "cost t"} \\\eq & \justif{definition of @{const cost}} \\ & @{term "weight t⇣1 + cost t⇣1 + weight t⇣2 + cost t⇣2"} \\\eq & \justif{induction hypothesis} \\ & @{term "weight t⇣1"} \mathrel{+} \sum_{a\in A_1\,} @{term "freq t⇣1 a * depth t⇣1 a"} \mathrel{+} {} \\ & @{term "weight t⇣2"} \mathrel{+} \sum_{a\in A_2\,} @{term "freq t⇣2 a * depth t⇣2 a"} \\\eq & \justif{definition of @{const depth}, consistency} \\[\extrah] & @{term "weight t⇣1"} \mathrel{+} \sum_{a\in A_1\,} @{term "freq t⇣1 a * (depth t a - 1)"} \mathrel{+} {} \\ & @{term "weight t⇣2"} \mathrel{+} \sum_{a\in A_2\,} @{term "freq t⇣2 a * (depth t a - 1)"} \\[\extrah]\eq & \justif{distributivity of @{text "*"} and \sum over -} \\[\extrah] & @{term "weight t⇣1"} \mathrel{+} \sum_{a\in A_1\,} @{term "freq t⇣1 a * depth t a"} \mathrel{-} \sum_{a\in A_1\,} @{term "freq t⇣1 a"} \mathrel{+} {} \\ & @{term "weight t⇣2"} \mathrel{+} \sum_{a\in A_2\,} @{term "freq t⇣2 a * depth t a"} \mathrel{-} \sum_{a\in A_2\,} @{term "freq t⇣2 a"} \\[\extrah]\eq & \justif{@{thm [source] weight_eq_Sum_freq}} \\[\extrah] & \sum_{a\in A_1\,} @{term "freq t⇣1 a * depth t a"} \mathrel{+} \sum_{a\in A_2\,} @{term "freq t⇣2 a * depth t a"} \\[\extrah]\eq & \justif{definition of @{const freq}, consistency} \\[\extrah] & \sum_{a\in A_1\,} @{term "freq t a * depth t a"} \mathrel{+} \sum_{a\in A_2\,} @{term "freq t a * depth t a"} \\[\extrah]\eq & \justif{@{thm [source] setsum_Un_disjoint}, consistency} \\ & \sum_{a\in A_1\cup A_2\,} @{term "freq t a * depth t a"} \\\eq & \justif{definition of @{const alphabet}} \\ & \sum_{a\in A\,} @{term "freq t a * depth t a"}.\end{tabularx}$$\noindentThe structured proof closely follows this argument.*}(*<*)theorem cost_eq_Sum_freq_mult_depth:"consistent t ==> cost t = (∑a ∈ alphabet t. freq t a * depth t a)"(*>*)proof (induct t)  case Leaf thus ?case by simpnext  case (InnerNode w t⇣1 t⇣2)  let ?t = "InnerNode w t⇣1 t⇣2"  let ?A = "alphabet ?t" and ?A⇣1 = "alphabet t⇣1" and ?A⇣2 = "alphabet t⇣2"  note c = consistent ?t  note hyps = InnerNode  have d⇣2: "!!a. [|?A⇣1 ∩ ?A⇣2 = {}; a ∈ ?A⇣2|] ==> depth ?t a = depth t⇣2 a + 1"    by auto  have "cost ?t = weight t⇣1 + cost t⇣1 + weight t⇣2 + cost t⇣2" by simp  also have "… = weight t⇣1 + (∑a ∈ ?A⇣1. freq t⇣1 a * depth t⇣1 a) +                  weight t⇣2 + (∑a ∈ ?A⇣2. freq t⇣2 a * depth t⇣2 a)"    using hyps by simp  also have "… = weight t⇣1 + (∑a ∈ ?A⇣1. freq t⇣1 a * (depth ?t a - 1)) +                  weight t⇣2 + (∑a ∈ ?A⇣2. freq t⇣2 a * (depth ?t a - 1))"    using c d⇣2 by simp  also have "… = weight t⇣1 + (∑a ∈ ?A⇣1. freq t⇣1 a * depth ?t a)                            - (∑a ∈ ?A⇣1. freq t⇣1 a) +                  weight t⇣2 + (∑a ∈ ?A⇣2. freq t⇣2 a * depth ?t a)                            - (∑a ∈ ?A⇣2. freq t⇣2 a)"    using c d⇣2 by (simp add: setsum_addf)  also have "… = (∑a ∈ ?A⇣1. freq t⇣1 a * depth ?t a) +                  (∑a ∈ ?A⇣2. freq t⇣2 a * depth ?t a)"    using c by (simp add: weight_eq_Sum_freq)  also have "… = (∑a ∈ ?A⇣1. freq ?t a * depth ?t a) +                  (∑a ∈ ?A⇣2. freq ?t a * depth ?t a)"    using c by auto  also have "… = (∑a ∈ ?A⇣1 ∪ ?A⇣2. freq ?t a * depth ?t a)"    using c by (simp add: setsum_Un_disjoint)  also have "… = (∑a ∈ ?A. freq ?t a * depth ?t a)" by simp  finally show ?case .qedtext {*Finally, it should come as no surprise that trees with height 0 have cost 0.*}lemma height_0_imp_cost_0 [simp]:"height t = 0 ==> cost t = 0"by (case_tac t) simp+subsection {* Optimality *}text {*A tree is optimum if and only if its cost is not greater than that of anycomparable tree. We can ignore inconsistent trees without loss of generality.*}definition optimum :: "'a tree => bool" where"optimum t ≡     ∀u. consistent u --> alphabet t = alphabet u --> freq t = freq u -->         cost t ≤ cost u"section {* Functional Implementation of Huffman's Algorithm           \label{implementation} *}subsection {* Cached Weight *}text {*The {\sl cached weight\/} of a node is the weight stored directly in the node.Our arguments rely on the computed weight (embodied by the @{const weight}function) rather than the cached weight, but the implementation of Huffman'salgorithm uses the cached weight for performance reasons.*}primrec cachedWeight :: "'a tree => nat" where"cachedWeight (Leaf w a) = w" |"cachedWeight (InnerNode w t⇣1 t⇣2) = w"text {*The cached weight of a leaf node is identical to its computed weight.*}lemma height_0_imp_cachedWeight_eq_weight [simp]:"height t = 0 ==> cachedWeight t = weight t"by (case_tac t) simp+subsection {* Tree Union *}text {*The implementation of Huffman's algorithm builds on two additional auxiliaryfunctions. The first one, @{text uniteTrees}, takes two trees$$\vcenter{\hbox{\includegraphics[scale=1.25]{tree-w1.pdf}}} \qquad \hbox{and} \qquad \vcenter{\hbox{\includegraphics[scale=1.25]{tree-w2.pdf}}}$$\noindentand returns the tree\strut$$\includegraphics[scale=1.25]{tree-w1-w2.pdf}$$\vskip-.5\myskipamount*}definition uniteTrees :: "'a tree => 'a tree => 'a tree" where"uniteTrees t⇣1 t⇣2 ≡ InnerNode (cachedWeight t⇣1 + cachedWeight t⇣2) t⇣1 t⇣2"text {*The alphabet, consistency, and symbol frequencies of a united tree are easy toconnect to the homologous properties of the subtrees.*}lemma alphabet_uniteTrees [simp]:"alphabet (uniteTrees t⇣1 t⇣2) = alphabet t⇣1 ∪ alphabet t⇣2"by (simp add: uniteTrees_def)lemma consistent_uniteTrees [simp]:"[|consistent t⇣1; consistent t⇣2; alphabet t⇣1 ∩ alphabet t⇣2 = {}|] ==> consistent (uniteTrees t⇣1 t⇣2)"by (simp add: uniteTrees_def)lemma freq_uniteTrees [simp]:"freq (uniteTrees t⇣1 t⇣2) = (λa. freq t⇣1 a + freq t⇣2 a)"by (simp add: uniteTrees_def)subsection {* Ordered Tree Insertion *}text {*The auxiliary function @{text insortTree} inserts a tree into a forest sortedby cached weight, preserving the sort order.*}primrec insortTree :: "'a tree => 'a forest => 'a forest" where"insortTree u [] = [u]" |"insortTree u (t # ts) =     (if cachedWeight u ≤ cachedWeight t then u # t # ts                                         else t # insortTree u ts)"text {*The resulting forest contains one more tree than the original forest. Clearly,it cannot be empty.*}lemma length_insortTree [simp]:"length (insortTree t ts) = length ts + 1"by (induct ts) simp+lemma insortTree_ne_Nil [simp]:"insortTree t ts ≠ []"by (case_tac ts) simp+text {*The alphabet, consistency, symbol frequencies, and height of a forest afterinsertion are easy to relate to the homologous properties of the originalforest and the inserted tree.*}lemma alphabet⇣F_insortTree [simp]:"alphabet⇣F (insortTree t ts) = alphabet t ∪ alphabet⇣F ts"by (induct ts) autolemma consistent⇣F_insortTree [simp]:"consistent⇣F (insortTree t ts) = consistent⇣F (t # ts)"by (induct ts) autolemma freq⇣F_insortTree [simp]:"freq⇣F (insortTree t ts) = (λa. freq t a + freq⇣F ts a)"by (induct ts) (simp add: ext)+lemma height⇣F_insortTree [simp]:"height⇣F (insortTree t ts) = max (height t) (height⇣F ts)"by (induct ts) autosubsection {* The Main Algorithm *}text {*Huffman's algorithm repeatedly unites the first two trees of the forest itreceives as argument until a single tree is left. It should initially beinvoked with a list of leaf nodes sorted by weight. Note that it is not definedfor the empty list.*}fun huffman :: "'a forest => 'a tree" where"huffman [t] = t" |"huffman (t⇣1 # t⇣2 # ts) = huffman (insortTree (uniteTrees t⇣1 t⇣2) ts)"text {*The time complexity of the algorithm is quadratic in the size of the forest.If we eliminated the inner node's cached weight component, and insteadrecomputed the weight each time it is needed, the complexity would remainquadratic, but with a larger constant. Using a binary search in @{constinsortTree}, the corresponding imperative algorithm is $O(n \log n)$ if we keepthe weight cache and $O(n^2)$ if we drop it. An $O(n)$ imperative implementationis possible by maintaining two queues, one containing the unprocessed leaf nodesand the other containing the combined trees \cite[p.~404]{knuth-1997}.The tree returned by the algorithm preserves the alphabet, consistency, andsymbol frequencies of the original forest.*}theorem alphabet_huffman [simp]:"ts ≠ [] ==> alphabet (huffman ts) = alphabet⇣F ts"by (induct ts rule: huffman.induct) autotheorem consistent_huffman [simp]:"[|consistent⇣F ts; ts ≠ []|] ==> consistent (huffman ts)"by (induct ts rule: huffman.induct) simp+theorem freq_huffman [simp]:"ts ≠ [] ==> freq (huffman ts) = freq⇣F ts"by (induct ts rule: huffman.induct) (auto simp: ext)section {* Definition of Auxiliary Functions Used in the Proof           \label{auxiliary} *}subsection {* Sibling of a Symbol *}text {*The {\sl sibling\/} of a symbol $a$ in a tree $t$ is the label of the node thatis the (left or right) sibling of the node labeled with $a$ in $t$. If thesymbol $a$ is not in $t$'s alphabet or it occurs in a node with no siblingleaf, we define the sibling as being $a$ itself; this gives us the nice propertythat if $t$ is consistent, then $@{term "sibling t a"} \not= a$ if and only if$a$ has a sibling. As an illustration, we have$@{term "sibling t a"} = b$,\vthinspace{} $@{term "sibling t b"} = a$,and $@{term "sibling t c"} = c$ for the tree\strut$$t \,= \vcenter{\hbox{\includegraphics[scale=1.25]{tree-sibling.pdf}}}$$*}fun sibling :: "'a tree => 'a => 'a" where"sibling (Leaf w⇣b b) a = a" |"sibling (InnerNode w (Leaf w⇣b b) (Leaf w⇣c c)) a =     (if a = b then c else if a = c then b else a)" |"sibling (InnerNode w t⇣1 t⇣2) a =     (if a ∈ alphabet t⇣1 then sibling t⇣1 a      else if a ∈ alphabet t⇣2 then sibling t⇣2 a      else a)"text {*Because @{const sibling} is defined using sequential pattern matching\cite{krauss-2007,krauss-2009}, reasoning about it can become tedious.Simplification rules therefore play an important role.*}lemma notin_alphabet_imp_sibling_id [simp]:"a ∉ alphabet t ==> sibling t a = a"by (cases rule: sibling.cases [where x = "(t, a)"]) simp+lemma height_0_imp_sibling_id [simp]:"height t = 0 ==> sibling t a = a"by (case_tac t) simp+lemma height_gt_0_in_alphabet_imp_sibling_left [simp]:"[|height t⇣1 > 0; a ∈ alphabet t⇣1|] ==> sibling (InnerNode w t⇣1 t⇣2) a = sibling t⇣1 a"by (case_tac t⇣1) simp+lemma height_gt_0_in_alphabet_imp_sibling_right [simp]:"[|height t⇣2 > 0; a ∈ alphabet t⇣1|] ==> sibling (InnerNode w t⇣1 t⇣2) a = sibling t⇣1 a"by (case_tac t⇣2) simp+lemma height_gt_0_notin_alphabet_imp_sibling_left [simp]:"[|height t⇣1 > 0; a ∉ alphabet t⇣1|] ==> sibling (InnerNode w t⇣1 t⇣2) a = sibling t⇣2 a"by (case_tac t⇣1) simp+lemma height_gt_0_notin_alphabet_imp_sibling_right [simp]:"[|height t⇣2 > 0; a ∉ alphabet t⇣1|] ==> sibling (InnerNode w t⇣1 t⇣2) a = sibling t⇣2 a"by (case_tac t⇣2) simp+lemma either_height_gt_0_imp_sibling [simp]:"height t⇣1 > 0 ∨ height t⇣2 > 0 ==> sibling (InnerNode w t⇣1 t⇣2) a =     (if a ∈ alphabet t⇣1 then sibling t⇣1 a else sibling t⇣2 a)"by autotext {*The following rules are also useful for reasoning about siblings and alphabets.*}lemma in_alphabet_imp_sibling_in_alphabet:"a ∈ alphabet t ==> sibling t a ∈ alphabet t"by (induct t a rule: sibling.induct) autolemma sibling_ne_imp_sibling_in_alphabet:"sibling t a ≠ a ==> sibling t a ∈ alphabet t"by (metis notin_alphabet_imp_sibling_id in_alphabet_imp_sibling_in_alphabet)text {*The default induction rule for @{const sibling} distinguishes four cases.\begin{myitemize}\item[] {\sc Base case:}\enskip $t = @{term "Leaf w b"}$.\item[] {\sc Induction step 1:}\enskip        $t = @{term "InnerNode w (Leaf w⇣b b) (Leaf w⇣c c)"}$.\item[] {\sc Induction step 2:}\enskip        $t = @{term "InnerNode w (InnerNode w⇣1 t⇣1⇣1 t⇣1⇣2) t⇣2"}$.\item[] {\sc Induction step 3:}\enskip        $t = @{term "InnerNode w t⇣1 (InnerNode w⇣2 t⇣2⇣1 t⇣2⇣2)"}$.\end{myitemize}\noindentThis rule leaves much to be desired. First, the last two cases overlap andcan normally be handled the same way, so they should be combined. Second, thenested @{text InnerNode} constructors in the last two cases reduce readability.Third, under the assumption that $t$ is consistent, we would like to performthe same case distinction on $a$ as we did for@{thm [source] tree_induct_consistent}, with the same benefits for automation.These observations lead us to develop a custom induction rule thatdistinguishes the following cases.\begin{myitemize}\item[] {\sc Base case:}\enskip $t = @{term "Leaf w b"}$.\item[] {\sc Induction step 1:}\enskip        $t = @{term "InnerNode w (Leaf w⇣b b) (Leaf w⇣c c)"}$ with        @{prop "b ≠ c"}.\item[] \begin{flushleft}        {\sc Induction step 2:}\enskip        $t = @{term "InnerNode w t⇣1 t⇣2"}$ and either @{term t⇣1} or @{term t⇣2}        has nonzero height.        \end{flushleft}\item[] \noindent\kern\leftmargin {\sc Subcase 1:}\enspace $a$ belongs to        @{term t⇣1} but not to @{term t⇣2}.\item[] \noindent\kern\leftmargin {\sc Subcase 2:}\enspace $a$ belongs to        @{term t⇣2} but not to @{term t⇣1}.\item[] \noindent\kern\leftmargin {\sc Subcase 3:}\enspace $a$ belongs to        neither @{term t⇣1} nor @{term t⇣2}.\end{myitemize}The statement of the rule and its proof are similar to what we did forconsistent trees, the main difference being that we now have two inductionsteps instead of one.*}lemma sibling_induct_consistent          [consumes 1, case_names base step⇣1 step⇣2⇣1 step⇣2⇣2 step⇣2⇣3]:"[|consistent t;  !!w b a. P (Leaf w b) a;  !!w w⇣b b w⇣c c a. b ≠ c ==> P (InnerNode w (Leaf w⇣b b) (Leaf w⇣c c)) a;  !!w t⇣1 t⇣2 a.     [|consistent t⇣1; consistent t⇣2; alphabet t⇣1 ∩ alphabet t⇣2 = {};      height t⇣1 > 0 ∨ height t⇣2 > 0; a ∈ alphabet t⇣1;      sibling t⇣1 a ∈ alphabet t⇣1; a ∉ alphabet t⇣2;      sibling t⇣1 a ∉ alphabet t⇣2; P t⇣1 a|] ==>     P (InnerNode w t⇣1 t⇣2) a;  !!w t⇣1 t⇣2 a.     [|consistent t⇣1; consistent t⇣2; alphabet t⇣1 ∩ alphabet t⇣2 = {};      height t⇣1 > 0 ∨ height t⇣2 > 0; a ∉ alphabet t⇣1;      sibling t⇣2 a ∉ alphabet t⇣1; a ∈ alphabet t⇣2;      sibling t⇣2 a ∈ alphabet t⇣2; P t⇣2 a|] ==>     P (InnerNode w t⇣1 t⇣2) a;  !!w t⇣1 t⇣2 a.     [|consistent t⇣1; consistent t⇣2; alphabet t⇣1 ∩ alphabet t⇣2 = {};      height t⇣1 > 0 ∨ height t⇣2 > 0; a ∉ alphabet t⇣1; a ∉ alphabet t⇣2|] ==>     P (InnerNode w t⇣1 t⇣2) a|] ==> P t a"apply rotate_tacapply induction_schema   apply atomize_elim   apply (case_tac t, simp)   apply clarsimp   apply (rename_tac a t⇣1 t⇣2)   apply (case_tac "height t⇣1 = 0 ∧ height t⇣2 = 0")    apply simp    apply (case_tac t⇣1)     apply (case_tac t⇣2)      apply fastforce     apply simp+   apply (auto intro: in_alphabet_imp_sibling_in_alphabet)[1]by lexicographic_ordertext {*The custom induction rule allows us to prove new properties of @{const sibling}with little effort.*}lemma sibling_sibling_id [simp]:"consistent t ==> sibling t (sibling t a) = a"by (induct t a rule: sibling_induct_consistent) simp+lemma sibling_reciprocal:"[|consistent t; sibling t a = b|] ==> sibling t b = a"by autolemma depth_height_imp_sibling_ne:"[|consistent t; depth t a = height t; height t > 0; a ∈ alphabet t|] ==> sibling t a ≠ a"by (induct t a rule: sibling_induct_consistent) autolemma depth_sibling [simp]:"consistent t ==> depth t (sibling t a) = depth t a"by (induct t a rule: sibling_induct_consistent) simp+subsection {* Leaf Interchange *}text {*The @{text swapLeaves} function takes a tree $t$ together with two symbols$a$, $b$ and their frequencies $@{term w⇣a}$, $@{term w⇣b}$, and returns the tree$t$ in which the leaf nodes labeled with $a$ and $b$ are exchanged. Wheninvoking @{text swapLeaves}, we normally pass @{term "freq t a"} and@{term "freq t b"} for @{term w⇣a} and @{term w⇣b}.Note that we do not bother updating the cached weight of the ancestor nodeswhen performing the interchange. The cached weight is used only in theimplementation of Huffman's algorithm, which doesn't invoke @{text swapLeaves}.*}primrec swapLeaves :: "'a tree => nat => 'a => nat => 'a => 'a tree" where"swapLeaves (Leaf w⇣c c) w⇣a a w⇣b b =     (if c = a then Leaf w⇣b b else if c = b then Leaf w⇣a a else Leaf w⇣c c)" |"swapLeaves (InnerNode w t⇣1 t⇣2) w⇣a a w⇣b b =     InnerNode w (swapLeaves t⇣1 w⇣a a w⇣b b) (swapLeaves t⇣2 w⇣a a w⇣b b)"text {*Swapping a symbol~$a$ with itself leaves the tree $t$ unchanged if $a$ does notbelong to it or if the specified frequencies @{term w⇣a} and @{term w⇣b} equal@{term "freq t a"}.*}lemma swapLeaves_id_when_notin_alphabet [simp]:"a ∉ alphabet t ==> swapLeaves t w a w' a = t"by (induct t) simp+lemma swapLeaves_id [simp]:"consistent t ==> swapLeaves t (freq t a) a (freq t a) a = t"by (induct t a rule: tree_induct_consistent) simp+text {*The alphabet, consistency, symbol depths, height, and symbol frequencies of thetree @{term "swapLeaves t w⇣a a w⇣b b"} can be related to the homologousproperties of $t$.*}lemma alphabet_swapLeaves:"alphabet (swapLeaves t w⇣a a w⇣b b) =     (if a ∈ alphabet t then        if b ∈ alphabet t then alphabet t else (alphabet t - {a}) ∪ {b}      else        if b ∈ alphabet t then (alphabet t - {b}) ∪ {a} else alphabet t)"by (induct t) autolemma consistent_swapLeaves [simp]:"consistent t ==> consistent (swapLeaves t w⇣a a w⇣b b)"by (induct t) (auto simp: alphabet_swapLeaves)lemma depth_swapLeaves_neither [simp]:"[|consistent t; c ≠ a; c ≠ b|] ==> depth (swapLeaves t w⇣a a w⇣b b) c = depth t c"by (induct t a rule: tree_induct_consistent) (auto simp: alphabet_swapLeaves)lemma height_swapLeaves [simp]:"height (swapLeaves t w⇣a a w⇣b b) = height t"by (induct t) simp+lemma freq_swapLeaves [simp]:"[|consistent t; a ≠ b|] ==> freq (swapLeaves t w⇣a a w⇣b b) =     (λc. if c = a then if b ∈ alphabet t then w⇣a else 0          else if c = b then if a ∈ alphabet t then w⇣b else 0          else freq t c)"apply (rule ext)apply (induct t)by autotext {*For the lemmas concerned with the resulting tree's weight and cost, we avoidsubtraction on natural numbers by rearranging terms. For example, we write$$@{prop "weight (swapLeaves t w⇣a a w⇣b b) + freq t a = weight t + w⇣b"}$$\noindentrather than the more conventional$$@{prop "weight (swapLeaves t w⇣a a w⇣b b) = weight t + w⇣b - freq t a"}.$$In Isabelle/HOL, these two equations are not equivalent, because by definition$m - n = 0$ if $n > m$. We could use the second equation and additionallyassert that @{prop "weight t ≥ freq t a"} (an easy consequence of@{thm [source] weight_eq_Sum_freq}), and then apply the \textit{arith}tactic, but it is much simpler to use the first equation and stay with\textit{simp} and \textit{auto}. Another option would be to useintegers instead of natural numbers.*}lemma weight_swapLeaves:"[|consistent t; a ≠ b|] ==> if a ∈ alphabet t then   if b ∈ alphabet t then     weight (swapLeaves t w⇣a a w⇣b b) + freq t a + freq t b =         weight t + w⇣a + w⇣b   else     weight (swapLeaves t w⇣a a w⇣b b) + freq t a = weight t + w⇣b else   if b ∈ alphabet t then     weight (swapLeaves t w⇣a a w⇣b b) + freq t b = weight t + w⇣a   else     weight (swapLeaves t w⇣a a w⇣b b) = weight t"proof (induct t a rule: tree_induct_consistent)  -- {* {\sc Base case:}\enspace $t = @{term "Leaf w b"}$ *}  case base thus ?case by clarsimpnext  -- {* {\sc Induction step:}\enspace $t = @{term "InnerNode w t⇣1 t⇣2"}$ *}  -- {* {\sc Subcase 1:}\enspace $a$ belongs to @{term t⇣1} but not to        @{term t⇣2} *}  case (step⇣1 w t⇣1 t⇣2 a) show ?case  proof cases    assume "b ∈ alphabet t⇣1"    moreover hence "b ∉ alphabet t⇣2" using step⇣1 by auto    ultimately show ?case using step⇣1 by simp  next    assume "b ∉ alphabet t⇣1" thus ?case using step⇣1 by auto  qednext  -- {* {\sc Subcase 2:}\enspace $a$ belongs to @{term t⇣2} but not to        @{term t⇣1} *}  case (step⇣2 w t⇣1 t⇣2 a) show ?case  proof cases    assume "b ∈ alphabet t⇣1"    moreover hence "b ∉ alphabet t⇣2" using step⇣2 by auto    ultimately show ?case using step⇣2 by simp  next    assume "b ∉ alphabet t⇣1" thus ?case using step⇣2 by auto  qednext  -- {* {\sc Subcase 3:}\enspace $a$ belongs to neither @{term t⇣1} nor        @{term t⇣2} *}  case (step⇣3 w t⇣1 t⇣2 a) show ?case  proof cases    assume "b ∈ alphabet t⇣1"    moreover hence "b ∉ alphabet t⇣2" using step⇣3 by auto    ultimately show ?case using step⇣3 by simp  next    assume "b ∉ alphabet t⇣1" thus ?case using step⇣3 by auto  qedqedlemma cost_swapLeaves:"[|consistent t; a ≠ b|] ==> if a ∈ alphabet t then   if b ∈ alphabet t then     cost (swapLeaves t w⇣a a w⇣b b) + freq t a * depth t a     + freq t b * depth t b =         cost t + w⇣a * depth t b + w⇣b * depth t a   else     cost (swapLeaves t w⇣a a w⇣b b) + freq t a * depth t a =         cost t + w⇣b * depth t a else   if b ∈ alphabet t then     cost (swapLeaves t w⇣a a w⇣b b) + freq t b * depth t b =         cost t + w⇣a * depth t b   else     cost (swapLeaves t w⇣a a w⇣b b) = cost t"proof (induct t)  case Leaf show ?case by simpnext  case (InnerNode w t⇣1 t⇣2)  note c = consistent (InnerNode w t⇣1 t⇣2)  note hyps = InnerNode  have w⇣1: "if a ∈ alphabet t⇣1 then              if b ∈ alphabet t⇣1 then                weight (swapLeaves t⇣1 w⇣a a w⇣b b) + freq t⇣1 a + freq t⇣1 b =                    weight t⇣1 + w⇣a + w⇣b                  else                weight (swapLeaves t⇣1 w⇣a a w⇣b b) + freq t⇣1 a = weight t⇣1 + w⇣b            else              if b ∈ alphabet t⇣1 then                weight (swapLeaves t⇣1 w⇣a a w⇣b b) + freq t⇣1 b = weight t⇣1 + w⇣a              else                weight (swapLeaves t⇣1 w⇣a a w⇣b b) = weight t⇣1" using hyps    by (simp add: weight_swapLeaves)  have w⇣2: "if a ∈ alphabet t⇣2 then              if b ∈ alphabet t⇣2 then                weight (swapLeaves t⇣2 w⇣a a w⇣b b) + freq t⇣2 a + freq t⇣2 b =                    weight t⇣2 + w⇣a + w⇣b              else                weight (swapLeaves t⇣2 w⇣a a w⇣b b) + freq t⇣2 a = weight t⇣2 + w⇣b            else              if b ∈ alphabet t⇣2 then                weight (swapLeaves t⇣2 w⇣a a w⇣b b) + freq t⇣2 b = weight t⇣2 + w⇣a              else                weight (swapLeaves t⇣2 w⇣a a w⇣b b) = weight t⇣2" using hyps    by (simp add: weight_swapLeaves)  show ?case  proof cases    assume a⇣1: "a ∈ alphabet t⇣1"    hence a⇣2: "a ∉ alphabet t⇣2" using c by auto    show ?case    proof cases      assume b⇣1: "b ∈ alphabet t⇣1"      hence "b ∉ alphabet t⇣2" using c by auto      thus ?case using a⇣1 a⇣2 b⇣1 w⇣1 w⇣2 hyps by simp    next      assume b⇣1: "b ∉ alphabet t⇣1" show ?case      proof cases        assume "b ∈ alphabet t⇣2" thus ?case using a⇣1 a⇣2 b⇣1 w⇣1 w⇣2 hyps by simp      next        assume "b ∉ alphabet t⇣2" thus ?case using a⇣1 a⇣2 b⇣1 w⇣1 w⇣2 hyps by simp      qed    qed  next    assume a⇣1: "a ∉ alphabet t⇣1" show ?case    proof cases      assume a⇣2: "a ∈ alphabet t⇣2" show ?case      proof cases        assume b⇣1: "b ∈ alphabet t⇣1"        hence "b ∉ alphabet t⇣2" using c by auto        thus ?case using a⇣1 a⇣2 b⇣1 w⇣1 w⇣2 hyps by simp      next        assume b⇣1: "b ∉ alphabet t⇣1" show ?case        proof cases          assume "b ∈ alphabet t⇣2" thus ?case using a⇣1 a⇣2 b⇣1 w⇣1 w⇣2 hyps by simp        next          assume "b ∉ alphabet t⇣2" thus ?case using a⇣1 a⇣2 b⇣1 w⇣1 w⇣2 hyps by simp        qed      qed    next      assume a⇣2: "a ∉ alphabet t⇣2" show ?case      proof cases        assume b⇣1: "b ∈ alphabet t⇣1"        hence "b ∉ alphabet t⇣2" using c by auto        thus ?case using a⇣1 a⇣2 b⇣1 w⇣1 w⇣2 hyps by simp      next        assume b⇣1: "b ∉ alphabet t⇣1" show ?case        proof cases          assume "b ∈ alphabet t⇣2" thus ?case using a⇣1 a⇣2 b⇣1 w⇣1 w⇣2 hyps by simp        next          assume "b ∉ alphabet t⇣2" thus ?case using a⇣1 a⇣2 b⇣1 w⇣1 w⇣2 hyps by simp        qed      qed    qed  qedqedtext {*Common sense tells us that the following statement is valid: If Astridexchanges her house with Bernard's neighbor, Bernard becomes Astrid's newneighbor.'' A similar property holds for binary trees.*}lemma sibling_swapLeaves_sibling [simp]:"[|consistent t; sibling t b ≠ b; a ≠ b|] ==> sibling (swapLeaves t w⇣a a w⇣s (sibling t b)) a = b"proof (induct t)  case Leaf thus ?case by simpnext  case (InnerNode w t⇣1 t⇣2)  note hyps = InnerNode  show ?case  proof (cases "height t⇣1 = 0")    case True    note h⇣1 = True    show ?thesis    proof (cases t⇣1)      case (Leaf w⇣c c)      note l⇣1 = Leaf      show ?thesis      proof (cases "height t⇣2 = 0")        case True        note h⇣2 = True        show ?thesis        proof (cases t⇣2)          case Leaf thus ?thesis using l⇣1 hyps by auto metis+        next          case InnerNode thus ?thesis using h⇣2 by simp        qed      next        case False        note h⇣2 = False        show ?thesis        proof cases          assume "c = b" thus ?thesis using l⇣1 h⇣2 hyps by simp        next          assume "c ≠ b"          have "sibling t⇣2 b ∈ alphabet t⇣2" using c ≠ b l⇣1 h⇣2 hyps            by (simp add: sibling_ne_imp_sibling_in_alphabet)          thus ?thesis using c ≠ b l⇣1 h⇣2 hyps by auto        qed      qed    next      case InnerNode thus ?thesis using h⇣1 by simp    qed  next    case False    note h⇣1 = False    show ?thesis    proof (cases "height t⇣2 = 0")      case True      note h⇣2 = True      show ?thesis      proof (cases t⇣2)        case (Leaf w⇣d d)        note l⇣2 = Leaf        show ?thesis        proof cases          assume "d = b" thus ?thesis using h⇣1 l⇣2 hyps by simp        next          assume "d ≠ b" show ?thesis          proof (cases "b ∈ alphabet t⇣1")            case True            hence "sibling t⇣1 b ∈ alphabet t⇣1" using d ≠ b h⇣1 l⇣2 hyps              by (simp add: sibling_ne_imp_sibling_in_alphabet)            thus ?thesis using True d ≠ b h⇣1 l⇣2 hyps              by (simp add: alphabet_swapLeaves)          next            case False thus ?thesis using d ≠ b l⇣2 hyps by simp          qed        qed      next        case InnerNode thus ?thesis using h⇣2 by simp      qed    next      case False      note h⇣2 = False      show ?thesis      proof (cases "b ∈ alphabet t⇣1")        case True thus ?thesis using h⇣1 h⇣2 hyps by auto      next        case False        note b⇣1 = False        show ?thesis        proof (cases "b ∈ alphabet t⇣2")          case True thus ?thesis using b⇣1 h⇣1 h⇣2 hyps            by (auto simp: in_alphabet_imp_sibling_in_alphabet                           alphabet_swapLeaves)        next          case False thus ?thesis using b⇣1 h⇣1 h⇣2 hyps by simp        qed      qed    qed  qedqedsubsection {* Symbol Interchange *}text {*The @{text swapSyms} function provides a simpler interface to@{const swapLeaves}, with @{term "freq t a"} and @{term "freq t b"} in place of@{term "w⇣a"} and @{term "w⇣b"}. Most lemmas about @{text swapSyms} are directlyadapted from the homologous results about @{const swapLeaves}.*}definition swapSyms :: "'a tree => 'a => 'a => 'a tree" where"swapSyms t a b ≡ swapLeaves t (freq t a) a (freq t b) b"lemma swapSyms_id [simp]:"consistent t ==> swapSyms t a a = t"by (simp add: swapSyms_def)lemma alphabet_swapSyms [simp]:"[|a ∈ alphabet t; b ∈ alphabet t|] ==> alphabet (swapSyms t a b) = alphabet t"by (simp add: swapSyms_def alphabet_swapLeaves)lemma consistent_swapSyms [simp]:"consistent t ==> consistent (swapSyms t a b)"by (simp add: swapSyms_def)lemma depth_swapSyms_neither [simp]:"[|consistent t; c ≠ a; c ≠ b|] ==> depth (swapSyms t a b) c = depth t c"by (simp add: swapSyms_def)lemma freq_swapSyms [simp]:"[|consistent t; a ∈ alphabet t; b ∈ alphabet t|] ==> freq (swapSyms t a b) = freq t"by (case_tac "a = b") (simp add: swapSyms_def ext)+lemma cost_swapSyms:assumes "consistent t" "a ∈ alphabet t" "b ∈ alphabet t"shows "cost (swapSyms t a b) + freq t a * depth t a + freq t b * depth t b =           cost t + freq t a * depth t b + freq t b * depth t a"proof cases  assume "a = b" thus ?thesis using assms by simpnext  assume "a ≠ b"  moreover hence "cost (swapLeaves t (freq t a) a (freq t b) b)                      + freq t a * depth t a + freq t b * depth t b =                  cost t + freq t a * depth t b + freq t b * depth t a"    using assms by (simp add: cost_swapLeaves)  ultimately show ?thesis using assms by (simp add: swapSyms_def)qedtext {*If $a$'s frequency is lower than or equal to $b$'s, and $a$ is higher up in thetree than $b$ or at the same level, then interchanging $a$ and $b$ does notincrease the tree's cost.*}lemma le_le_imp_sum_mult_le_sum_mult:"[|i ≤ j; m ≤ (n::nat)|] ==> i * n + j * m ≤ i * m + j * n"apply (subgoal_tac "i * m + i * (n - m) + j * m ≤ i * m + j * m + j * (n - m)") apply (simp add: diff_mult_distrib2)by simplemma cost_swapSyms_le:assumes "consistent t" "a ∈ alphabet t" "b ∈ alphabet t" "freq t a ≤ freq t b"        "depth t a ≤ depth t b"shows "cost (swapSyms t a b) ≤ cost t"proof -  let ?aabb = "freq t a * depth t a + freq t b * depth t b"  let ?abba = "freq t a * depth t b + freq t b * depth t a"  have "?abba ≤ ?aabb" using assms(4-5)    by (rule le_le_imp_sum_mult_le_sum_mult)  have "cost (swapSyms t a b) + ?aabb = cost t + ?abba" using assms(1-3)    by (simp add: cost_swapSyms nat_add_assoc [THEN sym])  also have "… ≤ cost t + ?aabb" using ?abba ≤ ?aabb by simp  finally show ?thesis using assms(4-5) by simpqedtext {*As stated earlier, If Astrid exchanges her house with Bernard's neighbor,Bernard becomes Astrid's new neighbor.''*}lemma sibling_swapSyms_sibling [simp]:"[|consistent t; sibling t b ≠ b; a ≠ b|] ==> sibling (swapSyms t a (sibling t b)) a = b"by (simp add: swapSyms_def)text {*If Astrid exchanges her house with Bernard, Astrid becomes Bernard's oldneighbor's new neighbor.''*}lemma sibling_swapSyms_other_sibling [simp]:"[|consistent t; sibling t b ≠ a; sibling t b ≠ b; a ≠ b|] ==> sibling (swapSyms t a b) (sibling t b) = a"by (metis consistent_swapSyms sibling_swapSyms_sibling sibling_reciprocal)subsection {* Four-Way Symbol Interchange              \label{four-way-symbol-interchange} *}text {*The @{const swapSyms} function exchanges two symbols $a$ and $b$. We use itto define the four-way symbol interchange function @{text swapFourSyms}, whichtakes four symbols $a$, $b$, $c$, $d$ with $a \ne b$ and $c \ne d$, andexchanges them so that $a$ and $b$ occupy $c$~and~$d$'s positions.A naive definition of this function would be$$@{prop "swapFourSyms t a b c d ≡ swapSyms (swapSyms t a c) b d"}.$$This definition fails in the face of aliasing: If $a = d$, but$b \ne c$, then @{text "swapFourSyms a b c d"} would leave $a$ in $b$'sposition.%\footnote{Cormen et al.\ \cite[p.~390]{cormen-et-al-2001} forgot to considerthis case in their proof. Thomas Cormen indicated in a personal communicationthat this will be corrected in the next edition of the book.}*}definition swapFourSyms :: "'a tree => 'a => 'a => 'a => 'a => 'a tree" where"swapFourSyms t a b c d ≡     if a = d then swapSyms t b c     else if b = c then swapSyms t a d     else swapSyms (swapSyms t a c) b d"text {*Lemmas about @{const swapFourSyms} are easy to prove by expanding itsdefinition.*}lemma alphabet_swapFourSyms [simp]:"[|a ∈ alphabet t; b ∈ alphabet t; c ∈ alphabet t; d ∈ alphabet t|] ==> alphabet (swapFourSyms t a b c d) = alphabet t"by (simp add: swapFourSyms_def)lemma consistent_swapFourSyms [simp]:"consistent t ==> consistent (swapFourSyms t a b c d)"by (simp add: swapFourSyms_def)lemma freq_swapFourSyms [simp]:"[|consistent t; a ∈ alphabet t; b ∈ alphabet t; c ∈ alphabet t;  d ∈ alphabet t|] ==> freq (swapFourSyms t a b c d) = freq t"by (auto simp: swapFourSyms_def)text {*More Astrid and Bernard insanity: If Astrid and Bernard exchange their houseswith Carmen and her neighbor, Astrid and Bernard will now be neighbors.''*}lemma sibling_swapFourSyms_when_4th_is_sibling:assumes "consistent t" "a ∈ alphabet t" "b ∈ alphabet t" "c ∈ alphabet t"        "a ≠ b" "sibling t c ≠ c"shows "sibling (swapFourSyms t a b c (sibling t c)) a = b"proof (cases "a ≠ sibling t c ∧ b ≠ c")  case True show ?thesis  proof -    let ?d = "sibling t c"    let ?t⇣s = "swapFourSyms t a b c ?d"    have abba: "(sibling ?t⇣s a = b) = (sibling ?t⇣s b = a)" using consistent t      by (metis consistent_swapFourSyms sibling_reciprocal)    have s: "sibling t c = sibling (swapSyms t a c) a" using True assms      by (metis sibling_reciprocal sibling_swapSyms_sibling)    have "sibling ?t⇣s b = sibling (swapSyms t a c) ?d" using s True assms      by (auto simp: swapFourSyms_def)    also have "… = a" using True assms      by (metis sibling_reciprocal sibling_swapSyms_other_sibling          swapLeaves_id swapSyms_def)    finally have "sibling ?t⇣s b = a" .    with abba show ?thesis ..  qednext  case False thus ?thesis using assms    by (auto intro: sibling_reciprocal simp: swapFourSyms_def)qedsubsection {* Sibling Merge *}text {*Given a symbol $a$, the @{text mergeSibling} function transforms the tree%\setbox\myboxi=\hbox{\includegraphics[scale=1.25]{tree-splitLeaf-a.pdf}}%\setbox\myboxii=\hbox{\includegraphics[scale=1.25]{tree-splitLeaf-ab.pdf}}%\mydimeni=\ht\myboxii$$\vcenter{\box\myboxii} \qquad \hbox{into} \qquad \smash{\lower\ht\myboxi\hbox{\raise.5\mydimeni\box\myboxi}}$$The frequency of $a$ in the result is the sum of the original frequencies of$a$ and $b$, so as not to alter the tree's weight.*}fun mergeSibling :: "'a tree => 'a => 'a tree" where"mergeSibling (Leaf w⇣b b) a = Leaf w⇣b b" |"mergeSibling (InnerNode w (Leaf w⇣b b) (Leaf w⇣c c)) a =     (if a = b ∨ a = c then Leaf (w⇣b + w⇣c) a      else InnerNode w (Leaf w⇣b b) (Leaf w⇣c c))" |"mergeSibling (InnerNode w t⇣1 t⇣2) a =     InnerNode w (mergeSibling t⇣1 a) (mergeSibling t⇣2 a)"text {*The definition of @{const mergeSibling} has essentially the same structure asthat of @{const sibling}. As a result, the custom induction rule that wederived for @{const sibling} works equally well for reasoning about@{const mergeSibling}.*}lemmas mergeSibling_induct_consistent = sibling_induct_consistenttext {*The properties of @{const mergeSibling} echo those of @{const sibling}. Likewith @{const sibling}, simplification rules are crucial.*}lemma notin_alphabet_imp_mergeSibling_id [simp]:"a ∉ alphabet t ==> mergeSibling t a = t"by (induct t a rule: mergeSibling.induct) simp+lemma height_gt_0_imp_mergeSibling_left [simp]:"height t⇣1 > 0 ==> mergeSibling (InnerNode w t⇣1 t⇣2) a =     InnerNode w (mergeSibling t⇣1 a) (mergeSibling t⇣2 a)"by (case_tac t⇣1) simp+lemma height_gt_0_imp_mergeSibling_right [simp]:"height t⇣2 > 0 ==> mergeSibling (InnerNode w t⇣1 t⇣2) a =     InnerNode w (mergeSibling t⇣1 a) (mergeSibling t⇣2 a)"by (case_tac t⇣2) simp+lemma either_height_gt_0_imp_mergeSibling [simp]:"height t⇣1 > 0 ∨ height t⇣2 > 0 ==> mergeSibling (InnerNode w t⇣1 t⇣2) a =     InnerNode w (mergeSibling t⇣1 a) (mergeSibling t⇣2 a)"by autolemma alphabet_mergeSibling [simp]:"[|consistent t; a ∈ alphabet t|] ==> alphabet (mergeSibling t a) = (alphabet t - {sibling t a}) ∪ {a}"by (induct t a rule: mergeSibling_induct_consistent) autolemma consistent_mergeSibling [simp]:"consistent t ==> consistent (mergeSibling t a)"by (induct t a rule: mergeSibling_induct_consistent) autolemma freq_mergeSibling:"[|consistent t; a ∈ alphabet t; sibling t a ≠ a|] ==> freq (mergeSibling t a) =     (λc. if c = a then freq t a + freq t (sibling t a)          else if c = sibling t a then 0          else freq t c)"by (induct t a rule: mergeSibling_induct_consistent)   (auto simp: fun_eq_iff)lemma weight_mergeSibling [simp]:"weight (mergeSibling t a) = weight t"by (induct t a rule: mergeSibling.induct) simp+text {*If $a$ has a sibling, merging $a$ and its sibling reduces $t$'s cost by@{term "freq t a + freq t (sibling t a)"}.*}lemma cost_mergeSibling:"[|consistent t; sibling t a ≠ a|] ==> cost (mergeSibling t a) + freq t a + freq t (sibling t a) = cost t"by (induct t a rule: mergeSibling_induct_consistent) autosubsection {* Leaf Split *}text {*The @{text splitLeaf} function undoes the merging performed by@{const mergeSibling}: Given two symbols $a$, $b$ and two frequencies$@{term w⇣a}$, $@{term w⇣b}$, it transforms\setbox\myboxi=\hbox{\includegraphics[scale=1.25]{tree-splitLeaf-a.pdf}}%\setbox\myboxii=\hbox{\includegraphics[scale=1.25]{tree-splitLeaf-ab.pdf}}%$$\smash{\lower\ht\myboxi\hbox{\raise.5\ht\myboxii\box\myboxi}} \qquad \hbox{into} \qquad \vcenter{\box\myboxii}$$In the resulting tree, $a$ has frequency @{term w⇣a} and $b$ has frequency@{term w⇣b}. We normally invoke it with @{term w⇣a}~and @{term w⇣b} such that@{prop "freq t a = w⇣a + w⇣b"}.*}primrec splitLeaf :: "'a tree => nat => 'a => nat => 'a => 'a tree" where"splitLeaf (Leaf w⇣c c) w⇣a a w⇣b b =     (if c = a then InnerNode w⇣c (Leaf w⇣a a) (Leaf w⇣b b) else Leaf w⇣c c)" |"splitLeaf (InnerNode w t⇣1 t⇣2) w⇣a a w⇣b b =     InnerNode w (splitLeaf t⇣1 w⇣a a w⇣b b) (splitLeaf t⇣2 w⇣a a w⇣b b)"primrec splitLeaf⇣F :: "'a forest => nat => 'a => nat => 'a => 'a forest" where"splitLeaf⇣F [] w⇣a a w⇣b b = []" |"splitLeaf⇣F (t # ts) w⇣a a w⇣b b =     splitLeaf t w⇣a a w⇣b b # splitLeaf⇣F ts w⇣a a w⇣b b"text {*Splitting leaf nodes affects the alphabet, consistency, symbol frequencies,weight, and cost in unsurprising ways.*}lemma notin_alphabet_imp_splitLeaf_id [simp]:"a ∉ alphabet t ==> splitLeaf t w⇣a a w⇣b b = t"by (induct t) simp+lemma notin_alphabet⇣F_imp_splitLeaf⇣F_id [simp]:"a ∉ alphabet⇣F ts ==> splitLeaf⇣F ts w⇣a a w⇣b b = ts"by (induct ts) simp+lemma alphabet_splitLeaf [simp]:"alphabet (splitLeaf t w⇣a a w⇣b b) =     (if a ∈ alphabet t then alphabet t ∪ {b} else alphabet t)"by (induct t) simp+lemma consistent_splitLeaf [simp]:"[|consistent t; b ∉ alphabet t|] ==> consistent (splitLeaf t w⇣a a w⇣b b)"by (induct t) autolemma freq_splitLeaf [simp]:"[|consistent t; b ∉ alphabet t|] ==> freq (splitLeaf t w⇣a a w⇣b b) =     (if a ∈ alphabet t then        (λc. if c = a then w⇣a else if c = b then w⇣b else freq t c)      else        freq t)"by (induct t b rule: tree_induct_consistent) (rule ext, auto)+lemma weight_splitLeaf [simp]:"[|consistent t; a ∈ alphabet t; freq t a = w⇣a + w⇣b|] ==> weight (splitLeaf t w⇣a a w⇣b b) = weight t"by (induct t a rule: tree_induct_consistent) simp+lemma cost_splitLeaf [simp]:"[|consistent t; a ∈ alphabet t; freq t a = w⇣a + w⇣b|] ==> cost (splitLeaf t w⇣a a w⇣b b) = cost t + w⇣a + w⇣b"by (induct t a rule: tree_induct_consistent) simp+subsection {* Weight Sort Order *}text {*An invariant of Huffman's algorithm is that the forest is sorted by weight.This is expressed by the @{text sortedByWeight} function.*}fun sortedByWeight :: "'a forest => bool" where"sortedByWeight [] = True" |"sortedByWeight [t] = True" |"sortedByWeight (t⇣1 # t⇣2 # ts) =     (weight t⇣1 ≤ weight t⇣2 ∧ sortedByWeight (t⇣2 # ts))"text {*The function obeys the following fairly obvious laws.*}lemma sortedByWeight_Cons_imp_sortedByWeight:"sortedByWeight (t # ts) ==> sortedByWeight ts"by (case_tac ts) simp+lemma sortedByWeight_Cons_imp_forall_weight_ge:"sortedByWeight (t # ts) ==> ∀u ∈ set ts. weight u ≥ weight t"proof (induct ts arbitrary: t)  case Nil thus ?case by simpnext  case (Cons u us) thus ?case by simp (metis le_trans)qedlemma sortedByWeight_insortTree:"[|sortedByWeight ts; height t = 0; height⇣F ts = 0|] ==> sortedByWeight (insortTree t ts)"by (induct ts rule: sortedByWeight.induct) autosubsection {* Pair of Minimal Symbols *}text {*The @{text minima} predicate expresses that two symbols$a$, $b \in @{term "alphabet t"}$ have the lowest frequencies in the tree $t$and that @{prop "freq t a ≤ freq t b"}. Minimal symbols need not be uniquelydefined.*}definition minima :: "'a tree => 'a => 'a => bool" where"minima t a b ≡     a ∈ alphabet t ∧ b ∈ alphabet t ∧ a ≠ b ∧ freq t a ≤ freq t b     ∧ (∀c ∈ alphabet t. c ≠ a --> c ≠ b -->                         freq t c ≥ freq t a ∧ freq t c ≥ freq t b)"section {* Formalization of the Textbook Proof           \label{formalization} *}subsection {* Four-Way Symbol Interchange Cost Lemma *}text {*If $a$ and $b$ are minima, and $c$ and $d$ are at the very bottom of the tree,then exchanging $a$ and $b$ with $c$ and $d$ doesn't increase the cost.Graphically, we have\strut%$${\it cost\/} \vcenter{\hbox{\includegraphics[scale=1.25]{tree-minima-abcd.pdf}}} \,\mathop{\le}\;\;\; {\it cost\/} \vcenter{\hbox{\includegraphics[scale=1.25]{tree-minima.pdf}}}$$\noindentThis cost property is part of Knuth's proof:\begin{quote}Let $V$ be an internal node of maximum distance from the root. If $w_1$ and$w_2$ are not the weights already attached to the children of $V$, we caninterchange them with the values that are already there; such an interchangedoes not increase the weighted path length.\end{quote}\noindentLemma~16.2 in Cormen et al.~\cite[p.~389]{cormen-et-al-2001} expresses asimilar property, which turns out to be a corollary of our cost property:\begin{quote}Let $C$ be an alphabet in which each character $c \in C$ has frequency $f[c]$.Let $x$ and $y$ be two characters in $C$ having the lowest frequencies. Thenthere exists an optimal prefix code for $C$ in which the codewords for $x$ and$y$ have the same length and differ only in the last bit.\end{quote}\vskip-.75\myskipamount*}lemma cost_swapFourSyms_le:assumes "consistent t" "minima t a b" "c ∈ alphabet t" "d ∈ alphabet t"        "depth t c = height t" "depth t d = height t" "c ≠ d"shows "cost (swapFourSyms t a b c d) ≤ cost t"proof -  note lems = swapFourSyms_def minima_def cost_swapSyms_le depth_le_height  show ?thesis  proof (cases "a ≠ d ∧ b ≠ c")    case True show ?thesis    proof cases      assume "a = c" show ?thesis      proof cases        assume "b = d" thus ?thesis using a = c True assms          by (simp add: lems)      next        assume "b ≠ d" thus ?thesis using a = c True assms          by (simp add: lems)      qed    next      assume "a ≠ c" show ?thesis      proof cases        assume "b = d" thus ?thesis using a ≠ c True assms          by (simp add: lems)      next        assume "b ≠ d"        have "cost (swapFourSyms t a b c d) ≤ cost (swapSyms t a c)"          using b ≠ d a ≠ c True assms by (clarsimp simp: lems)        also have "… ≤ cost t" using b ≠ d a ≠ c True assms          by (clarsimp simp: lems)        finally show ?thesis .      qed    qed  next    case False thus ?thesis using assms by (clarsimp simp: lems)  qedqedsubsection {* Leaf Split Optimality Lemma              \label{leaf-split-optimality} *}text {*The tree @{term "splitLeaf t w⇣a a w⇣b b"} is optimum if $t$ is optimum, under afew assumptions, notably that $a$ and $b$ are minima of the new tree andthat @{prop "freq t a = w⇣a + w⇣b"}.Graphically:\strut%\setbox\myboxi=\hbox{\includegraphics[scale=1.2]{tree-splitLeaf-a.pdf}}%\setbox\myboxii=\hbox{\includegraphics[scale=1.2]{tree-splitLeaf-ab.pdf}}%$${\it optimum\/} \smash{\lower\ht\myboxi\hbox{\raise.5\ht\myboxii\box\myboxi}} \,\mathop{\Longrightarrow}\;\;\; {\it optimum\/} \vcenter{\box\myboxii}$$%This corresponds to the following fragment of Knuth's proof:\begin{quote}Now it is easy to prove that the weighted path length of such a tree isminimized if and only if the tree with$$\vcenter{\hbox{\includegraphics[scale=1.25]{tree-w1-w2-leaves.pdf}}} \qquad \hbox{replaced by} \qquad \vcenter{\hbox{\includegraphics[scale=1.25]{tree-w1-plus-w2.pdf}}}$$has minimum path length for the weights $w_1 + w_2$, $w_3$, $\ldots\,$, $w_m$.\end{quote}\noindentWe only need the if'' direction of Knuth's equivalence. Lemma~16.3 inCormen et al.~\cite[p.~391]{cormen-et-al-2001} expresses essentially the sameproperty:\begin{quote}Let $C$ be a given alphabet with frequency $f[c]$ defined for each character$c \in C$. Let $x$ and $y$ be two characters in $C$ with minimum frequency. Let$C'$ be the alphabet $C$ with characters $x$, $y$ removed and (new) character$z$ added, so that $C' = C - \{x, y\} \cup {\{z\}}$; define $f$ for $C'$ as for$C$, except that $f[z] = f[x] + f[y]$. Let $T'$ be any tree representing anoptimal prefix code for the alphabet $C'$. Then the tree $T$, obtained from$T'$ by replacing the leaf node for $z$ with an internal node having $x$ and$y$ as children, represents an optimal prefix code for the alphabet $C$.\end{quote}\noindentThe proof is as follows: We assume that $t$ has a cost less than or equal tothat of any other comparable tree~$v$ and show that@{term "splitLeaf t w⇣a a w⇣b b"} has a cost less than or equal to that of anyother comparable tree $u$. By @{thm [source] exists_at_height} and@{thm [source] depth_height_imp_sibling_ne}, we know that some symbols $c$ and$d$ appear in sibling nodes at the very bottom of~$u$:$$\includegraphics[scale=1.25]{tree-splitLeaf-cd.pdf}$$(The question mark is there to remind us that we know nothing specific about$u$'s structure.) From $u$ we construct a new tree@{term "swapFourSyms u a b c d"} in which the minima $a$ and $b$ are siblings:$$\includegraphics[scale=1.25]{tree-splitLeaf-abcd.pdf}$$Merging $a$ and $b$ gives a tree comparable with $t$, which we can use toinstantiate $v$ in the assumption:$$\includegraphics[scale=1.25]{tree-splitLeaf-abcd-aba.pdf}$$With this instantiation, the proof is easy:$$\begin{tabularx}{\textwidth}{@%{\hskip\leftmargin}cX@%{}} & @{term "cost (splitLeaf t a w⇣a b w⇣b)"} \\\eq & \justif{@{thm [source] cost_splitLeaf}} \\ & @{term "cost t + w⇣a + w⇣b"} \\\kern-1em\leq\kern-1em & \justif{assumption} \\[-2ex] & @{text "cost ("} \overbrace{\strut\!@{term "mergeSibling (swapFourSyms u a b c d) a"}\!} ^{\smash{\hbox{v}}}@{text ") + w⇣a + w⇣b"} \\[\extrah]\eq & \justif{@{thm [source] cost_mergeSibling}} \\ & @{term "cost (swapFourSyms u a b c d)"} \\\kern-1em\leq\kern-1em & \justif{@{thm [source] cost_swapFourSyms_le}} \\ & @{term "cost u"}. \\\end{tabularx}$$\noindentIn contrast, the proof in Cormen et al.\ is by contradiction: Essentially, theyassume that there exists a tree $u$ with a lower cost than@{term "splitLeaf t a w⇣a b w⇣b"} and show that there exists a tree~$v$with a lower cost than~$t$, contradicting the hypothesis that $t$ is optimum. Inplace of @{thm [source] cost_swapFourSyms_le}, they invoke their lemma~16.2,which is questionable since $u$ is not necessarily optimum.%\footnote{Thomas Cormen commented that this step will be clarified in thenext edition of the book.}Our proof relies on the following lemma, which asserts that $a$ and $b$ areminima of $u$.*}lemma twice_freq_le_imp_minima:"[|∀c ∈ alphabet t. w⇣a ≤ freq t c ∧ w⇣b ≤ freq t c;  alphabet u = alphabet t ∪ {b}; a ∈ alphabet u; a ≠ b;  freq u = (λc. if c = a then w⇣a else if c = b then w⇣b else freq t c);  w⇣a ≤ w⇣b|] ==> minima u a b"by (simp add: minima_def)text {*Now comes the key lemma.*}lemma optimum_splitLeaf:assumes "consistent t" "optimum t" "a ∈ alphabet t" "b ∉ alphabet t"        "freq t a = w⇣a + w⇣b" "∀c ∈ alphabet t. freq t c ≥ w⇣a ∧ freq t c ≥ w⇣b"        "w⇣a ≤ w⇣b"shows "optimum (splitLeaf t w⇣a a w⇣b b)"proof (unfold optimum_def, clarify)  fix u  let ?t' = "splitLeaf t w⇣a a w⇣b b"  assume c⇣u: "consistent u"         and a⇣u: "alphabet ?t' = alphabet u"         and f⇣u: "freq ?t' = freq u"  show "cost ?t' ≤ cost u"  proof (cases "height ?t' = 0")    case True thus ?thesis by simp  next    case False    hence h⇣u: "height u > 0" using a⇣u assms      by (auto intro: height_gt_0_alphabet_eq_imp_height_gt_0)    have a⇣a: "a ∈ alphabet u" using a⇣u assms by fastforce    have a⇣b: "b ∈ alphabet u" using a⇣u assms by fastforce    have ab: "a ≠ b" using assms by blast    from exists_at_height [OF c⇣u]    obtain c where a⇣c: "c ∈ alphabet u" and d⇣c: "depth u c = height u" ..    let ?d = "sibling u c"    have dc: "?d ≠ c" using d⇣c c⇣u h⇣u a⇣c by (metis depth_height_imp_sibling_ne)    have a⇣d: "?d ∈ alphabet u" using dc      by (rule sibling_ne_imp_sibling_in_alphabet)    have d⇣d: "depth u ?d = height u" using d⇣c c⇣u by simp    let ?u' = "swapFourSyms u a b c ?d"    have c⇣u⇣': "consistent ?u'" using c⇣u by simp    have a⇣u⇣': "alphabet ?u' = alphabet u" using a⇣a a⇣b a⇣c a⇣d a⇣u by simp    have f⇣u⇣': "freq ?u' = freq u" using a⇣a a⇣b a⇣c a⇣d c⇣u f⇣u by simp    have s⇣a: "sibling ?u' a = b" using c⇣u a⇣a a⇣b a⇣c ab dc      by (rule sibling_swapFourSyms_when_4th_is_sibling)    let ?v = "mergeSibling ?u' a"    have c⇣v: "consistent ?v" using c⇣u⇣' by simp    have a⇣v: "alphabet ?v = alphabet t" using s⇣a c⇣u⇣' a⇣u⇣' a⇣a a⇣u assms by auto    have f⇣v: "freq ?v = freq t"      using s⇣a c⇣u⇣' a⇣u⇣' f⇣u⇣' f⇣u [THEN sym] ab a⇣u [THEN sym] assms      by (simp add: freq_mergeSibling ext)    have "cost ?t' = cost t + w⇣a + w⇣b" using assms by simp    also have "… ≤ cost ?v + w⇣a + w⇣b" using c⇣v a⇣v f⇣v optimum t      by (simp add: optimum_def)    also have "… = cost ?u'"      proof -        have "cost ?v + freq ?u' a + freq ?u' (sibling ?u' a) = cost ?u'"          using c⇣u⇣' s⇣a assms by (subst cost_mergeSibling) auto        moreover have "w⇣a = freq ?u' a" "w⇣b = freq ?u' b"          using f⇣u⇣' f⇣u [THEN sym] assms by clarsimp+        ultimately show ?thesis using s⇣a by simp      qed    also have "… ≤ cost u"      proof -        have "minima u a b" using a⇣u f⇣u assms          by (subst twice_freq_le_imp_minima) auto        with c⇣u show ?thesis using a⇣c a⇣d d⇣c d⇣d dc [THEN not_sym]          by (rule cost_swapFourSyms_le)      qed    finally show ?thesis .  qedqedsubsection {* Leaf Split Commutativity Lemma              \label{leaf-split-commutativity} *}text {*A key property of Huffman's algorithm is that once it has combined twolowest-weight trees using @{const uniteTrees}, it doesn't visit these treesever again. This suggests that splitting a leaf node before applying thealgorithm should give the same result as applying the algorithm first andsplitting the leaf node afterward. The diagram below illustrates thesituation:\strut\def\myscale{1.05}%\setbox\myboxi=\hbox{(9)\strut}%\setbox\myboxii=\hbox{\includegraphics[scale=\myscale]{forest-a.pdf}}%$$(1)\,\lower\ht\myboxii\hbox{\raise\ht\myboxi\box\myboxii}$$\smallskip\setbox\myboxii=\hbox{\includegraphics[scale=\myscale]{tree-splitLeaf-a.pdf}}%\setbox\myboxiii=\hbox{\includegraphics[scale=\myscale]%  {forest-splitLeaf-ab.pdf}}%\mydimeni=\wd\myboxii\noindent(2a)\,\lower\ht\myboxii\hbox{\raise\ht\myboxi\box\myboxii}%  \qquad\qquad\quad  (2b)\,\lower\ht\myboxiii\hbox{\raise\ht\myboxi\box\myboxiii}\quad{}\setbox\myboxiii=\hbox{\includegraphics[scale=\myscale]%  {tree-splitLeaf-ab.pdf}}%\setbox\myboxiv=\hbox{\includegraphics[scale=\myscale]%  {tree-huffman-splitLeaf-ab.pdf}}%\mydimenii=\wd\myboxiii\vskip1.5\smallskipamount\noindent(3a)\,\lower\ht\myboxiii\hbox{\raise\ht\myboxi\box\myboxiii}%  \qquad\qquad\quad  (3b)\,\hfill\lower\ht\myboxiv\hbox{\raise\ht\myboxi\box\myboxiv}%  \quad\hfill{}\noindentFrom the original forest (1), we can either run the algorithm (2a) and thensplit $a$ (3a) or split $a$ (2b) and then run the algorithm (3b). Our goal isto show that trees (3a) and (3b) are identical. Formally, we prove that$$@{term "splitLeaf (huffman ts) w⇣a a w⇣b b"} = @{term "huffman (splitLeaf⇣F ts w⇣a a w⇣b b)"}$$when @{term ts} is consistent, @{term "a ∈ alphabet⇣F ts"}, @{term"b ∉ alphabet⇣F ts"}, and $@{term "freq⇣F ts a"} = @{term w⇣a}\mathbin{@{text "+"}} @{term "w⇣b"}$. But before we can prove thiscommutativity lemma, we need to introduce a few simple lemmas.*}lemma cachedWeight_splitLeaf [simp]:"cachedWeight (splitLeaf t w⇣a a w⇣b b) = cachedWeight t"by (case_tac t) simp+lemma splitLeaf⇣F_insortTree_when_in_alphabet_left [simp]:"[|a ∈ alphabet t; consistent t; a ∉ alphabet⇣F ts; freq t a = w⇣a + w⇣b|] ==> splitLeaf⇣F (insortTree t ts) w⇣a a w⇣b b = insortTree (splitLeaf t w⇣a a w⇣b b) ts"by (induct ts) simp+lemma splitLeaf⇣F_insortTree_when_in_alphabet⇣F_tail [simp]:"[|a ∈ alphabet⇣F ts; consistent⇣F ts; a ∉ alphabet t; freq⇣F ts a = w⇣a + w⇣b|] ==> splitLeaf⇣F (insortTree t ts) w⇣a a w⇣b b =     insortTree t (splitLeaf⇣F ts w⇣a a w⇣b b)"proof (induct ts)  case Nil thus ?case by simpnext  case (Cons u us) show ?case  proof (cases "a ∈ alphabet u")    case True    moreover hence "a ∉ alphabet⇣F us" using Cons by auto    ultimately show ?thesis using Cons by auto  next    case False thus ?thesis using Cons by simp  qedqedtext {*We are now ready to prove the commutativity lemma.*}lemma splitLeaf_huffman_commute:"[|consistent⇣F ts; ts ≠ []; a ∈ alphabet⇣F ts; freq⇣F ts a = w⇣a + w⇣b|] ==> splitLeaf (huffman ts) w⇣a a w⇣b b = huffman (splitLeaf⇣F ts w⇣a a w⇣b b)"proof (induct ts rule: huffman.induct)  -- {* {\sc Base case 1:}\enskip $@{term ts} = []$ *}  case 3 thus ?case by simpnext  -- {* {\sc Base case 2:}\enskip $@{term ts} = @{term "[t]"}$ *}  case (1 t) thus ?case by simpnext  -- {* {\sc Induction step:}\enskip $@{term ts} = @{term "t⇣1 # t⇣2 # ts"}$ *}  case (2 t⇣1 t⇣2 ts)  note hyps = 2  show ?case  proof (cases "a ∈ alphabet t⇣1")    case True    moreover hence "a ∉ alphabet t⇣2" "a ∉ alphabet⇣F ts" using hyps by auto    ultimately show ?thesis using hyps by (simp add: uniteTrees_def)  next    case False    note a⇣1 = False    show ?thesis    proof (cases "a ∈ alphabet t⇣2")      case True      moreover hence "a ∉ alphabet⇣F ts" using hyps by auto      ultimately show ?thesis using a⇣1 hyps by (simp add: uniteTrees_def)    next      case False      thus ?thesis using a⇣1 hyps by simp    qed  qedqedtext {*An important consequence of the commutativity lemma is that applying Huffman'salgorithm on a forest of the form$$\vcenter{\hbox{\includegraphics[scale=1.25]{forest-uniteTrees.pdf}}}$$gives the same result as applying the algorithm on the flat'' forest$$\vcenter{\hbox{\includegraphics[scale=1.25]{forest-uniteTrees-flat.pdf}}}$$followed by splitting the leaf node $a$ into two nodes $a$, $b$ withfrequencies $@{term w⇣a}$, $@{term w⇣b}$. The lemma effectivelyprovides a way to flatten the forest at each step of the algorithm. Itsinvocation is implicit in the textbook proof.*}subsection {* Optimality Theorem *}text {*We are one lemma away from our main result.*}lemma max_0_imp_0 [simp]:"(max x y = (0::nat)) = (x = 0 ∧ y = 0)"by autotheorem optimum_huffman:"[|consistent⇣F ts; height⇣F ts = 0; sortedByWeight ts; ts ≠ []|] ==> optimum (huffman ts)"txt {*The input @{term ts} is assumed to be a nonempty consistent list of leaf nodessorted by weight. The proof is by induction on the length of the forest@{term ts}. Let @{term ts} be$$\vcenter{\hbox{\includegraphics[scale=1.25]{forest-flat.pdf}}}$$with $w_a \le w_b \le w_c \le w_d \le \cdots \le w_z$. If @{term ts} consistsof a single leaf node, the node has cost 0 and is therefore optimum. If@{term ts} has length 2 or more, the first step of the algorithm leaves us withthe term$${\it huffman\/}\enskip\; \vcenter{\hbox{\includegraphics[scale=1.25]% {forest-uniteTrees.pdf}}}$$In the diagram, we put the newly created tree at position 2 in the forest; ingeneral, it could be anywhere. By @{thm [source] splitLeaf_huffman_commute},the above tree equals\strut$${\it splitLeaf\/}\;\left({\it huffman\/}\enskip\; \vcenter{\hbox{\includegraphics[scale=1.25]{forest-uniteTrees-flat.pdf}}} \;\right)\,@{text "w⇣a a w⇣b b"}.$$To prove that this tree is optimum, it suffices by@{thm [source] optimum_splitLeaf} to show that\strut$${\it huffman\/}\enskip\; \vcenter{\hbox{\includegraphics[scale=1.25]{forest-uniteTrees-flat.pdf}}}$$is optimum, which follows from the induction hypothesis.\strut*}proof (induct ts rule: length_induct)  -- {* \sc Complete induction step *}  case (1 ts)  note hyps = 1  show ?case  proof (cases ts)    case Nil thus ?thesis using ts ≠ [] by fast  next    case (Cons t⇣a ts')    note ts = Cons    show ?thesis    proof (cases ts')      case Nil thus ?thesis using ts hyps by (simp add: optimum_def)    next      case (Cons t⇣b ts'')      note ts' = Cons      show ?thesis      proof (cases t⇣a)        case (Leaf w⇣a a)        note l⇣a = Leaf        show ?thesis        proof (cases t⇣b)          case (Leaf w⇣b b)          note l⇣b = Leaf          show ?thesis          proof -            let ?us = "insortTree (uniteTrees t⇣a t⇣b) ts''"            let ?us' = "insortTree (Leaf (w⇣a + w⇣b) a) ts''"            let ?t⇣s = "splitLeaf (huffman ?us') w⇣a a w⇣b b"            have e⇣1: "huffman ts = huffman ?us" using ts' ts by simp            have e⇣2: "huffman ?us = ?t⇣s" using l⇣a l⇣b ts' ts hyps              by (auto simp: splitLeaf_huffman_commute uniteTrees_def)            have "optimum (huffman ?us')" using l⇣a ts' ts hyps              by (drule_tac x = ?us' in spec)                 (auto dest: sortedByWeight_Cons_imp_sortedByWeight                       simp: sortedByWeight_insortTree)            hence "optimum ?t⇣s" using l⇣a l⇣b ts' ts hyps              apply simp              apply (rule optimum_splitLeaf)              by (auto dest!: height⇣F_0_imp_Leaf_freq⇣F_in_set                              sortedByWeight_Cons_imp_forall_weight_ge)            thus "optimum (huffman ts)" using e⇣1 e⇣2 by simp          qed        next          case InnerNode thus ?thesis using ts' ts hyps by simp        qed      next        case InnerNode thus ?thesis using ts' ts hyps by simp      qed    qed  qedqedtext {*\isakeyword{end}\myskip\noindentSo what have we achieved? Assuming that our definitions really mean what weintend them to mean, we established that our functional implementation ofHuffman's algorithm, when invoked properly, constructs a binary tree thatrepresents an optimal prefix code for the specified alphabet and frequencies.Using Isabelle's code generator \cite{haftmann-nipkow-2007}, we can convert theIsabelle code into Standard ML, OCaml, or Haskell and use it in a realapplication.As a side note, the @{thm [source] optimum_huffman} theorem assumes that theforest @{term ts} passed to @{const huffman} consists exclusively of leaf nodes.It is tempting to relax this restriction, by requiring instead that the forest@{term ts} has the lowest cost among forests of the same size. We would defineoptimality of a forest as follows:\begin{aligned}[t] @{prop "optimum⇣F ts"}\,\;@{text "≡"}\;\, (@{text "∀us."}\ & @{text "length ts = length us --> consistent⇣F us -->"} \\[-2.5pt] & @{text "alphabet⇣F ts = alphabet⇣F us --> freq⇣F ts = freq⇣F us -->"}\\[-2.5pt] & @{prop "cost⇣F ts ≤ cost⇣F us"})\end{aligned}with $@{text "cost⇣F [] = 0"}$ and$@{prop "cost⇣F (t # ts) = cost t + cost⇣F ts"}$. However, the modifiedproposition does not hold. A counterexample is the optimum forest$$\includegraphics{forest-optimal.pdf}$$for which the algorithm constructs the tree$$\vcenter{\hbox{\includegraphics{tree-suboptimal.pdf}}} \qquad \hbox{of greater cost than} \qquad \vcenter{\hbox{\includegraphics{tree-optimal.pdf}}}$$*}section {* Related Work           \label{related-work} *}text {*Laurent Th\'ery's Coq formalization of Huffman's algorithm \cite{thery-2003,%thery-2004} is an obvious yardstick for our work. It has a somewhat widerscope, proving among others the isomorphism between prefix codes and full binarytrees. With 291 theorems, it is also much larger.Th\'ery identified the following difficulties in formalizing the textbookproof:\begin{enumerate}\item The leaf interchange process that brings the two minimal symbols together      is tedious to formalize.\item The sibling merging process requires introducing a new symbol for the      merged node, which complicates the formalization.\item The algorithm constructs the tree in a bottom-up fashion. While top-down      procedures can usually be proved by structural induction, bottom-up      procedures often require more sophisticated induction principles and      larger invariants.\item The informal proof relies on the notion of depth of a node. Defining this      notion formally is problematic, because the depth can only be seen as a      function if the tree is composed of distinct nodes.\end{enumerate}To circumvent these difficulties, Th\'ery introduced the ingenious concept ofcover. A forest @{term ts} is a {\em cover\/} of a tree~$t$ if $t$ can be builtfrom @{term ts} by adding inner nodes on top of the trees in @{term ts}. Theterm cover'' is easier to understand if the binary trees are drawn with theroot at the bottom of the page, like natural trees. Huffman's algorithm isa refinement of the cover concept. The main proof consists in showing thatthe cost of @{term "huffman ts"} is less than or equal to that of any othertree for which @{term ts} is a cover. It relies on a few auxiliary definitions,notably an ordered cover'' concept that facilitates structural inductionand a four-argument depth predicate (confusingly called @{term height}).Permutations also play a central role.Incidentally, our experience suggests that the potential problems identifiedby Th\'ery can be overcome more directly without too much work, leading to asimpler proof:\begin{enumerate}\item Formalizing the leaf interchange did not prove overly tedious. Among our      95~lemmas and theorems, 24 concern @{const swapLeaves},      @{const swapSyms}, and @{const swapFourSyms}.\item The generation of a new symbol for the resulting node when merging two      sibling nodes in @{const mergeSibling} was trivially solved by reusing      one of the two merged symbols.\item The bottom-up nature of the tree construction process was addressed by      using the length of the forest as the induction measure and by merging      the two minimal symbols, as in Knuth's proof.\item By restricting our attention to consistent trees, we were able to define      the @{const depth} function simply and meaningfully.\end{enumerate}*}section {* Conclusion           \label{conclusion} *}text {*The goal of most formal proofs is to increase our confidence in a result. Inthe case of Huffman's algorithm, however, the chances that a bug would havegone unnoticed for the 56 years since its publication, under the scrutiny ofleading computer scientists, seem extremely low; and the existence of a Coqproof should be sufficient to remove any remaining doubts.The main contribution of this report has been to demonstrate that the textbookproof of Huffman's algorithm can be elegantly formalized usinga state-of-the-art theorem prover such as Isabelle/HOL. In the process, weuncovered a few minor snags in the proof given in Cormen etal.~\cite{cormen-et-al-2001}.We also found that custom induction rules, in combination with suitablesimplification rules, greatly help the automatic proof tactics, sometimesreducing 30-line proof scripts to one-liners. We successfully applied thisapproach for handling both the ubiquitous datatype + well\-formed\-nesspredicate'' combination (@{typ "'a tree"} + @{const consistent}) and functionsdefined by sequential pattern matching (@{const sibling} and@{const mergeSibling}). Our experience suggests that such rules, which areuncommon in formalizations, are highly valuable and versatile. Moreover,Isabelle's \textit{induction\_schema} and \textit{lexicographic\_order} tacticsmake these easy to prove.Formalizing the proof of Huffman's algorithm also led to a deeperunderstanding of this classic algorithm. Many of the lemmas, notably the leafsplit commutativity lemma of Section~\ref{leaf-split-commutativity}, have notbeen found in the literature and express fundamental properties of thealgorithm. Other discoveries didn't find their way into the final proof. Inparticular, each step of the algorithm appears to preserve the invariant thatthe nodes in a forest are ordered by weight from left to right, bottom to top,as in the example below:\strut$$\vcenter{\hbox{\includegraphics[scale=1.25]{forest-zigzag.pdf}}}$$It is not hard to prove formally that a tree exhibiting this property isoptimum. On the other hand, proving that the algorithm preserves this invariantseems difficult---more difficult than formalizing the textbook proof---andremains a suggestion for future work.A few other directions for future work suggest themselves. First, we couldformalize some of our hypotheses, notably our restriction to full andconsistent binary trees. The current formalization says nothing about thealgorithm's application for data compression, so the next step could be toextend the proof's scope to cover @{term encode}/@{term decode} functionsand show that full binary trees are isomorphic to prefix codes, as done in theCoq development. Independently, we could generalize the development to $n$-arytrees.*}(*<*)end(*>*)