bindatpackage of Emacs. It is basically a way of writing specifications for translating binary to alists (attribute lists) and back again. Once you have a specification written (and it is pretty easy once you get the hang of it), converting data back and forth is incredibly easy. I did run into a slight hitch with a couple things, though:
- Variable-length, null-terminated strings and
- Union structures
What is Bindat?Let's say you have some binary data that you want Emacs to deal with. Rather than custom-writing a binary parsing function (because really, who wants to do that in 2010?), you can write a bindat specification and then use the
bindat-unpackfunctions to convert alists to and from binary, respectively. Let's say the binary format consists of:
- A byte indicating the version
- A byte for the length of a string
- Up to 255 bytes of a string, without a null byte at the end
Pretty self-explanatory, with the complication that the(setq spec '((version byte) (length byte) (string str (length))))
(length)gets filled with the integer value of the
lengthfield after it has been read. To convert data to binary, you would create an alist like this:
And stuff them both into(setq data '((version . 5) (length . 11) (string . "example.com")))
Which will give you(bindat-pack spec data)
"\x05"is a byte with value 5 and
"\x0B"is the byte with value 11. As you can imagine, the function
bindat-unpackdoes the reverse. Pretty cool! And pretty readable, too! If you aren't doing anything much more complex than this, there really isn't much else to learn (aside from the "u8", "u16", and "u32" types, which do what you would expect). The Elisp manual on this is pretty good, so check it out for more depth.
The hard partSo, as the title of the post suggests, I was using bindat to implement the SOCKS4 protocol in Emacs (yes I know it's already been done; I was curious!). One thing that SOCKS4 (and SOCKS4a) does is include a "user ID" in the request, so that the proxy server can perform (very very) basic access control, or something. The problem is that all of the datatypes in bindat expect an explicit length, but the only length indication SOCKS4 allows is the trailing null byte. There may well be a better solution to this (and if not, I should contribute one :), but I came up with a (very hackish) solution. It takes advantage of the fact that the "length" field can be an Elisp form, which is evaluated during packing and unpacking. So here is a specification for the SOCKS4a protocol in all its ugliness:
Bleah! That's not nearly as pretty and simple as the earlier one! The trick is that for each of the string fields ((setq socks4a-spec '((version byte) (cmd-stat byte) (port u16) (addr ip) (id strz (eval (cond ((bindat-get-field struct 'id) (1+ (length (bindat-get-field struct 'id)))) (t (1+ (- (search "\0" bindat-raw :start2 bindat-idx) bindat-idx)) )))) (domain strz (eval (cond ((bindat-get-field struct 'domain) (1+ (length (bindat-get-field struct 'domain)))) (t (1+ (- (or (search "\0" bindat-raw :start2 (1+ bindat-idx)) 100) bindat-idx)) ))))))
domain, both of which are null-terminated), we search ahead for a null byte and use the distance between the start of that field,
bindat-idx, and the null byte. We add one to the length to make room for the null byte itself. This is only for unpacking (converting from a byte array to Elisp structures), though. When converting to a byte array, we can use the
bindat-get-fieldfunction to get the value of the field we are encoding, and then take its length (again, adding one for the null byte).
Oh, the agony!So yeah, that's what I spent my night doing. As a bonus, here are the specs for SOCKS5 (which was much easier, as it uses the vastly-superior Pascal convention for strings).
The one thing that took a bit of figuring out was the(setq socks5-greeting-spec '((version byte) (auth-count u8) (auth-list repeat (auth-count) (method byte)))) (setq socks5-ehlo-spec '((version byte) (auth byte))) (setq pstring-spec '((len u8) (str str (len)))) (setq socks5-conn-spec '((version byte) (cmd byte) (res byte) (addr-type byte) (union (addr-type) (1 (addr ip)) (3 (struct pstring-spec)) (4 (addr6 vec 16))) (port u16)))
unionspecification, as it isn't quite the same form as the others. Figuring exactly what got nested and by how much was a
What now?Having taken a brief look at the
socks.elpackage that ships with Emacs, it looks like it could be simplified by using the bindat package to handle the network protocol stuff. I'll have to see about adding that in at some point. Also, making it easier to use variable-length null-terminated strings would make the whole experience much more pleasant, so I'll see about getting that included as well. In the meantime, happy hacking in 2010!