Project 3: A Magical Code

You run the spy agency for a country at war. You need to send messages behind enemy lines, and have several secret agents who are willing to take the risk to deliver those messages. Those agents could be stopped and searched at any time, making anything that looks like a code suspicious. For that reason, and to avoid being tracked, agents do not carry electronic devices.

The agency has come up with a cool idea: encode a message in a deck of 52 regular playing cards. To avoid suspicion, these decks would be carried by agents trained as magicians. Agents carry many decks, only one of which contains the message. The message is presumably encoded in the sequence of cards.

In the event that an agent is searched and asked about the cards, agents are trained to respond as follows: ``Officer, I'm a magician and these decks of cards are the tools of my trade. They cannot contain secret information. Look, I'll take a card from the top of the deck and insert it randomly back into the deck. You can even choose where to insert it. I'll do that several more times for you on each of the decks. Surely that convinces you that any hypothetical message in the cards is now scrambled, and that therefore these cards are innocuous. By the way, I think there's a rabbit in your hat...'' It is possible that an agent is stopped more than once on their journey, in which case the procedure is applied each time in the same way.

What is not entirely obvious here is that even a fair number of top-to-random insertions don't really scramble the sequence enough to prevent information transfer. An analysis of top-to-random shuffling shows that even after many card insertions, the entropy remaining in the deck is high. You can use that entropy to encode your message.

Your job is to design a scheme for encoding short messages into card sequences. A message is a sequence of English words or character strings that may not be words (e.g., names, numbers) separated by spaces and other regular punctuation. We'll discuss in class a subset of compact messages that avoids the use of stop words, capitalization and punctuation to reduce the number of bits needed to get the message across. You will need to write functions encode() and decode() to convert between messages and card sequences and back. We'll let s() be a randomized function that moves the top card in the deck to a random position. The encode() and decode() functions are decided before the agent goes into the field, so they cannot depend on the actual messages or the true number of top-to-random insertions. The true number of top-to-random insertions encountered during transit is unknown to both the encoder and decoder of the message.

The basic requirements are:

The more advanced goals are:

We'll provide a simulator for you to interact with. You supply the simulator with the encode() and decode() functions, and the simulator will test them using messages from a user-specified file. The randomness induced by the top-to-random insertions will be determined using a random number seed, so that runs are repeatable, and so that different groups can be run with the same deck permutations. Details of the simulator will be provided later. At the end of the project we'll run tournaments to test the capabilities of your schemes under various conditions.

We'll also talk in class about how to measure the success of your schemes. For example, given a list of messages, the simulator can run a large number of trials for each message with various values of n, to see how often your program succeeds for each n. The simulator can also interleave these requests with random deck permutations to see how often your decode function correctly identifies the deck as not containing a message. We might also bound n by some value, and the simulator can try longer and longer messages to see how far your scheme can go. What is a good way to score a partially decoded message? Another dimension is the domain of messages. Simple messages are just sequences of one or more English words, e.g., ``activate device''. More complex messages might include dates, times, latitudes, longitudes, names, numbers, etc. Knowing the message domain might allow you choose a suitable encoding.

Some things to think about:

Ken Ross 2022-09-12