Moseycode URL Form

The encoding described in this document is in the public domain.

This document describes a scheme for encoding binary data into the reserved character set in the URI Generic Syntax.

It is a very simple encoding that operates similarly to Base64 encoding. The differences are:


Each of the 64 ASCII characters in the list below is put into correspondence with its index in the list (A is zero).


A stream of octets is encoded by arranging the bits of its octets in big-endian order and partioning them into 6 bits words (henceforth simply refered to as words) such that the most significant bits of the first octet occupy the first word. If the number of bits in the octet stream is not evenly divisible by 6, the bit stream is padded with the least number of zero bits necessary to make it so. The words that result are each transcribed into their corresponding ASCII character.

An ASCII character sequence that has been arrived at via this process is decoded as follows: Each character in the sequence is transcoded into its corresponding word value. If the number of characters equals 2 modulo 4 the last four bits are trimmed; 3 modulo 4 the last two bits are trimmed; in this scheme it is impossible for the number of characters in the encoding to equal 1 modulo 4. The resulting bits can then be partioned into bytes to recover the original octect stream.


Byte Sequence    Encoding
0:               AA
255:             _w
19, 232:         E-g
19, 232, 37:     E-gl
19, 232, 37, 77: E-glTQ

Intended Use

This encoding is to be used to convert data from a Moseycode barcode into a form that can be embedded in a URL to be made accessible via other barcode systems which are supported with software that effectively constrains their data semantics to directing a user to a URL.

An example is:


It was important to have a compact encoding because the URLs described above are ultimately destined for representation in existing barcode symbologies. The maximum length of URL that can be recorded in a QR Code with dimensions 25x25 is 32 characters*. Given that the scheme and host together ( require 16 characters, 16 characters remain and it is therefore clear that the data to be stored must be accomodated in the path of the URL. The bytes to be stored can be reduced from 12 (96 bits) to 10 (80 bits) by removing the checksum and relying on error correction in the barcode representation.

Straightforward encoding to Base64 of the remaining data introduces asymmetry (+ needs to be decoded as a space) and reusability constraints (due to inclusion of path separators). By not padding input data to a multiple of 3 bytes, this encoding is potentially less time-efficient than Base64, but over such tiny amounts of data the speed-difference is irrelevant and space-efficiency matters far more.

The choice of '-' and '_' as the non-alphanumeric characters was made on the basis that these characters commonly occur in URL paths and furthermore this choice leaves the character '.' free for future use as in suffixes.

* Datamatrix accomodates a 32 character URL in a 'smaller matrix' (22x22) than QR Code does (25x25). The next smallest Datamatrix barcodes (20x20) can only store 25 character URLs; these are too small to store the data in a Moseycode barcode. This makes optimization of the encoding for 32 characters the sensible criterion.