Implementation notes

Language with only one symbol.

----------------------------------------------------------------------
0. Concept

I wanted to make a language with just one command.  Actually I wanted
to make a thing that contained lots of identical lines, where you
would delete a certain subset of lines to select different functions.
But I realized that programming with long lists of to-be-deleted line
numbers is equivalent to programming with mixed blank/command lines,
just with extra steps.  So I settled on an encoder that translates
input to single-command programs.

----------------------------------------------------------------------
1. Language design

It's possible to design a single command language around unary
encoding, but that would be horribly inefficient.  But if we make use
of the whitespaces around that one command, we can get at least binary
encoding.  C is mostly not a whitespace sensitive language, but it
does provide __LINE__, so I have designed the whole language around
using __LINE__ to encode blank lines.

The "design notes" section of the packaged remarks.md described most
of the tradeoffs in detail, here I will just comment on the things
that I did not do:

- Huffman encoding: instead of the deflate-based compression,
  initially I was thinking more of a per-byte compression scheme, and
  Huffman encoding was an obvious choice.  This was quickly abandoned
  because it's not possible to get better than an 8x compression ratio
  (need at least 1 bit per byte), and considering all the overhead
  needed to implement the single-command language decoder, output size
  will likely be very large, so this encoding scheme was not
  attractive.

  It's also unattractive for the fact that Huffman encoding requires
  preprocessing the data to generate an optimal dictionary, or come
  with a fixed dictionary baked in.  So the result is either not
  optimal or not streamable.

- Variable length encoding: the reason why Huffman encoding seemed
  attractive was that it's possible to spend fewer bits for certain
  bytes.  I thought maybe I could still shave some bits off of each
  byte if I embraced variable length encoding, considering that most
  ASCII text can fit in 6 bits.

  I ended up not using variable length instructions because I thought
  having branches in the language was more interesting, and it's
  really troublesome to figure out the branch target if the
  instructions are of variable length.  By sticking to fixed-length
  instructions that always end with a 1 bit, I can guarantee that
  there would be a non-empty line just before the start of the
  instruction I wanted to branch to, and tweak some state machines to
  make sure the instructions flow correctly after the branch.

----------------------------------------------------------------------
2. Language implementation

The decoder comes in two parts: the trivial part is outputting a
single byte, which basically adds bits by inferring current bit
position from __LINE__ and flush all the bits on the 9th line.  There
is a minor optimization in XORing all bytes with 0x20 first, otherwise
outputting bytes is completely straightforward.

The more convoluted part involves implementing loops.  True loops like
for/while/do are not possible because all these constructs require a
pair of matching things, and a single-command language can only
guarantee one of the pair.  Instead, I used setjmp wrapped inside a
switch statement, and generated unique case labels using __LINE__.
This allows the code to jump to a wide range of targets via longjmp.

Both the trivial and convoluted parts are implemented in a single long
statement, with a generous use of comma and ternary operators.  The
one command is defined to call a function with that one statement.
Initially I didn't have that separate function at all, and the one
command expands to the long expression directly.  Compile time was
terrible but that was kind of OK, what was not OK was that the output
executable size grew significantly.  So now the one command expands to
a case label and function call.

----------------------------------------------------------------------
3. Language encoder

Converting a byte stream into this single-command language involves
maintaining a ring buffer of output bytes, and generate output-literal
or repeat-previous-substring instructions as appropriate.  A few
conditions must be met for outputting repeat instructions:

- The repeated substring is of at least length 3.
- The repeated substring is within addressable distance.
- Repeat instruction must not cause a stack overflow at run time.

A production LZW implementation will likely maintain some dictionary
to speed up matching, but since sliding window is relatively small, I
just implemented it with a couple of nested loops.  The implementation
can process input at a rate of about 640K/s, which is not very fast at
all, but I don't anticipate people using this tool to encode more than
a few megabytes of data.

----------------------------------------------------------------------
4. Polish and detail

Encoder has been subject to a comprehensive set of tests inputs large
and small.  Also, both the encoder and generated output has been
verified to be acceptable to GCC and Clang as both C and C++.

Once I felt that I have sufficient test coverage, I did the usual
polishing work of converting the code to ASCII art.  This time in two
flavors:

- Fern, as originally planned.  When I think about a single-command
  scenario, the first thing that came to mind was Fern casting
  zoltraak repeatedly, so I decided on "zoltraak" as the output token
  and Fern as the layout before I wrote any code.

- Can of spam.  Because repeating the same string over and over was
  essentially spam for anyone who has seen that Monty Python sketch.
  So I added an alternative version that outputs "spam" instead of
  "zoltraak".

This time around, I also improved on the tool to insert arbitrary CRC32
into the code.  It used to be something that takes a few minutes of
brute force, but now I can do it in just a few seconds.

----------------------------------------------------------------------
5. Finally...

I was disappointed that Marcille didn't make it to the list of
winners.  I don't know if the judges didn't have Sixel-enabled
terminals or if they find organically shaped dungeons boring.
Anyways, that disappointment drove me to complete this entry, which I
submitted on the first day when the contest was open.  It was about a
month and a half worth of work.

Fern ends up winning "Most magical word" award :)