Saturday, January 2, 2010

Wearing SOCKS with Emacs

So apparently I have way too much time on my hands, so I went ahead and taught myself the amazingly cool bindat package of Emacs. It is basically a way of writing specifications for translating binary to alists (attribute lists) and back again. Once you have a specification written (and it is pretty easy once you get the hang of it), converting data back and forth is incredibly easy. I did run into a slight hitch with a couple things, though:
  1. Variable-length, null-terminated strings and
  2. Union structures
In an effort to make life easier for future Emacsians, I'll give the specs here, removing the need to figure it out yourself. But first, let me give a brief introduction to Bindat.

What is Bindat?

Let's say you have some binary data that you want Emacs to deal with. Rather than custom-writing a binary parsing function (because really, who wants to do that in 2010?), you can write a bindat specification and then use the bindat-pack and bindat-unpack functions to convert alists to and from binary, respectively. Let's say the binary format consists of:
  1. A byte indicating the version
  2. A byte for the length of a string
  3. Up to 255 bytes of a string, without a null byte at the end
Our spec would look like this:
(setq spec '((version byte)
             (length byte)
             (string str (length))))
Pretty self-explanatory, with the complication that the (length) gets filled with the integer value of the length field after it has been read. To convert data to binary, you would create an alist like this:
(setq data '((version . 5)
             (length . 11)
             (string . "example.com")))
And stuff them both into bindat-pack as follows:
(bindat-pack spec data)
Which will give you "\x05\x0Bexample.com", where "\x05" is a byte with value 5 and "\x0B" is the byte with value 11. As you can imagine, the function bindat-unpack does the reverse. Pretty cool! And pretty readable, too! If you aren't doing anything much more complex than this, there really isn't much else to learn (aside from the "u8", "u16", and "u32" types, which do what you would expect). The Elisp manual on this is pretty good, so check it out for more depth.

The hard part

So, as the title of the post suggests, I was using bindat to implement the SOCKS4 protocol in Emacs (yes I know it's already been done; I was curious!). One thing that SOCKS4 (and SOCKS4a) does is include a "user ID" in the request, so that the proxy server can perform (very very) basic access control, or something. The problem is that all of the datatypes in bindat expect an explicit length, but the only length indication SOCKS4 allows is the trailing null byte. There may well be a better solution to this (and if not, I should contribute one :), but I came up with a (very hackish) solution. It takes advantage of the fact that the "length" field can be an Elisp form, which is evaluated during packing and unpacking. So here is a specification for the SOCKS4a protocol in all its ugliness:
(setq socks4a-spec
      '((version byte)
        (cmd-stat byte)
        (port u16)
        (addr ip)
        (id strz (eval (cond
                        ((bindat-get-field struct 'id)
                         (1+ (length (bindat-get-field struct 'id))))
                        (t
                         (1+ (- (search "\0" bindat-raw :start2 bindat-idx) bindat-idx))
                         ))))
        (domain strz (eval (cond
                            ((bindat-get-field struct 'domain)
                             (1+ (length (bindat-get-field struct 'domain))))
                            (t
                             (1+ (- (or (search "\0" bindat-raw :start2 (1+ bindat-idx)) 100) bindat-idx))
                             ))))))
Bleah! That's not nearly as pretty and simple as the earlier one! The trick is that for each of the string fields (id and domain, both of which are null-terminated), we search ahead for a null byte and use the distance between the start of that field, bindat-idx, and the null byte. We add one to the length to make room for the null byte itself. This is only for unpacking (converting from a byte array to Elisp structures), though. When converting to a byte array, we can use the bindat-get-field function to get the value of the field we are encoding, and then take its length (again, adding one for the null byte).

Oh, the agony!

So yeah, that's what I spent my night doing. As a bonus, here are the specs for SOCKS5 (which was much easier, as it uses the vastly-superior Pascal convention for strings).
(setq socks5-greeting-spec
      '((version byte)
        (auth-count u8)
        (auth-list repeat (auth-count)
                   (method byte))))


(setq socks5-ehlo-spec
      '((version byte)
        (auth byte)))

(setq pstring-spec
      '((len u8)
        (str str (len))))


(setq socks5-conn-spec
      '((version byte)
        (cmd byte)
        (res byte)
        (addr-type byte)
        (union (addr-type)
               (1 (addr ip))
               (3 (struct pstring-spec))
               (4 (addr6 vec 16)))
        (port u16)))
The one thing that took a bit of figuring out was the union specification, as it isn't quite the same form as the others. Figuring exactly what got nested and by how much was a fun not at all fun experience, though is nice to know.

What now?

Having taken a brief look at the socks.el package that ships with Emacs, it looks like it could be simplified by using the bindat package to handle the network protocol stuff. I'll have to see about adding that in at some point. Also, making it easier to use variable-length null-terminated strings would make the whole experience much more pleasant, so I'll see about getting that included as well. In the meantime, happy hacking in 2010!