Implementation notes Break that key. ---------------------------------------------------------------------- 0. Concept This all started with episode 12 of the first season of Nisekoi anime. I decided that I must write a program about breaking keys. ---------------------------------------------------------------------- 1. Vigenere cipher "Keys" in computer science usually refers to keys used in ciphers. Breaking the key means breaking the cipher, which is generally not easy. The easiest way to do it is to find a cipher that is already broken. One of the classical ciphers would be a good place to start. Vigenere cipher was probably the simplest classical cipher that can be implemented, but complicated enough to have a key. As an added bonus, implementing the Vigenere means I also get a rot13 filter for free (I know, I probably wrote rot13 like five times already, but can't hurt to have another one). Vigenere can be broken by analyzing character frequencies: if you know the key length and language of the original text, you can determine what each of the key characters one by one. For example, given a 2 character key, the ciphertext will have strides interleaved like this: ABABABAB... All characters in the 'A' stride will have one frequency distribution. The correct shift offset (i.e. key character) will minimize errors between the frequency distribution of 'A' with frequency distribution for the original language, and this shift offset can be determined independent of whatever is in 'B'. I implemented this by testing each of the 26 shift offsets and picking one that minimizes squared error. This was relatively easy. The above presumes that key length is known, but how do we determine key length? Various online literature suggests looking for repeated susbtrings in ciphertext and using common divisors of those offsets to determine key length. Instead of doing that, I just tried all key lengths from 1 to 64, which takes less than a second for average ciphertexts that are few kilobytes long, and that worked well enough. Note that all of these are not guaranteed to work. Vigenere Cipher with an extremely long key is as effective as One Time Pad, which is unbreakable, so I just don't worry about it. I got my key breaking thing, which was what I set out to achieve. ---------------------------------------------------------------------- 2. Encoder I started by writing a encryption program ("the encoder") in Perl, because such tasks are trivial to do in Perl. I needed this encoder to produce the breakable ciphertexts that the cracker (i.e. the program that breaks keys) would consume. On selecting encryption/decryption modes: I thought about making this an extra command line argument, but in the end I decided that it would work best by using case differences to specify which direction characters should shift, eliminating the need to use an extra command line argument. Lowercase keys causes characters to shift in the positive direction, which seemed more natural to do by hand, so I declared that the convention shall be to use uppercase keys for "encryption", thus lowercase keys will be used for "decryption". ---------------------------------------------------------------------- 3. Cracker Initially I was worried that the key cracker will be very slow, I was all set to do multithreaded C++ and all, but thought I should try tweaking the different crack strategies in Perl first. Turns out it runs in reasonable time, so I didn't bother with C++. I thought about the options for implementing this cracker, including having it support compute expected character frequencies from some user supplied input file. In the end I just hardcoded a frequency table for English, which I assume will be the common use case most of the time. It even works sometimes when the input text is not English (seemed to work for French). To get the sample frequencies, I had to find one text that would be most representative of proper English, and I chose "Alice in Wonderland". It seemed to work well. ---------------------------------------------------------------------- 4. Encoder + cracker + ...? Now I have two Perl programs on my hands, one of which is a text encoder. Obviously I must combine these two into one program, and so I did: - The encoder implements Vigenere cipher encryption/decryption. - Running encoder with the right key and using encoder source as input produces the cracker source. Key for the cracker is "chitoge", since Kirisaki Chitoge was the one who broke the key. The encoder is then formatted using Onodera Kosaki as the template. This pairing seemed entirely logical to me. Having gotten two programs, and considering that there were actually four key holders in Nisekoi, I thought: can I fit in another one? So I added Tachibana Marika as the third program, accessible using the key "marika". This third program strips non-alpha characters from input. Then I thought... how about a fourth program? None of the ideas immediately came to mind, so I justified to myself that anymore programs would be feature creep and I didn't need any more. Good enough. The program selection works like this: $a = "program1"; $b = "program2, encrypted with one key"; $c = "program3, encrypted with another key"; 'a' =~ /a(?{eval $a})|b(?{eval $b})|c(?{eval $c})/; Adjusting the a+b+c bits above such that the right characters match up for each of the keys. The choice of characters was done with a straightforward brute force search. ---------------------------------------------------------------------- 5. Testing Kosaki has been tested verified to work on these environments: - Perl 5.14.4 / Cygwin - Perl 5.18.2 / Linux - Perl 5.20.2 / Linux Different versions of Perl have subtly different requirements on what can be done inside "eval". For example, the following worked in 5.14.4 but not 5.18.2: '' =~ ('(?{'.('%+!,_+'^'@]@@{@').'})'); This is a real shame, because if I was able to write program without using alphanumeric characters, the problem of multiplexing programs would be slightly simpler. Another side note, all versions of Perl I tested will not execute this: $a = q{ map {} foreach 1..2; print "Unreachable\n"; }; 'a' =~ /a(?{eval $a})/; But they are all fine with this: $a = q{ foreach (1..2) {} print "OK\n"; }; 'a' =~ /a(?{eval $a})/; This appears to be due to how $_ is modified in one but not another. There is no error message when things die inside eval, so I can't be sure. This project revealed more subtleties about Perl than I expected to find, but in the end I got things working on all platforms I had access to. ---------------------------------------------------------------------- 6. Finally... Will Nisekoi have a Kosaki ending or a Chitoge ending? Who knows?