Huffman file format First 4 bytes contain file size of uncompressed file (little endian). Bits following describes the huffman tree: 0 -> this is an internal node, next bit describes the left branch 1 -> this is a leaf node, next 8 bits describe the byte value Size of tree is at most 9*256 + 255 bits (256 leaf nodes, 255 internal nodes). Following bits are the huffman data, 0 for following left branch, 1 for following the right branch. 0 bits are padded at the end to align to byte boundary. In the case where there is only one leaf node, the huffman data will be missing, since the file can be constructed using the tree data and file size alone. Encoding the huffman tree 1. Build huffman tree using conventional methods 2. Dump tree using preorder traversal Decoding the huffman tree create root node pointer, push onto stack L1: read next bit from file if 0 read: create node, set pointer on stack to point at node, push onto stack if 1 read: create leaf node using next 8 bits set pointer on stack to point at node L2: if stack contains only this leaf, process is complete. pop stack if right pointer on stack is empty set pointer to right pointer on stack else pop stack, goto L2 goto L1