Saturday, January 30, 2010

News Flows

Just like old times, I'm turning an email reply into a full-fledged blog post.

This is in response to an email my dad sent me, which I have quoted inline.

http://www.newyorker.com/reporting/2010/01/25/100125fa_fact_auletta

You can probably find the full text.

I didn't bother. There are enough other interesting things to read out there that I'm not going to spend the time (let alone money) to jump through hoops to read the full version of this.

The article is mainly about how the Obama administration relates to the press and tries to control the stories published about it. But buried in there are some interesting comments about the process. They claim that reporters now must file numerous daily reports- blog posts, tweets, updates for the paper or network website, do on air interviews for their site, or networks... They are so busy doing all of this that they do not have time for reporting. They cannot discuss issues with other people at the White House, or call up experts elsewhere for feedback. Instead, they just print whatever someone at the White House said, and are thankful they managed to get a quote in time to file their report.

I think this is indicative of a larger shift in the news ecosystem. My prediction/idea is that there will be a split between the people reporting on the basic facts of the situation -- who said what when -- and those writing interesting narratives that tie things together.

Until now, each newspaper or TV station had to have its own person physically present in the room at a press conference in order to write down what was said and then type it up or say it on the air. With the ease of duplicating information, we don't need 50 people to write down the exact same thing. We probably want more than one person to be doing this to avoid mistakes and such, but there is no need for every national news outlet to have someone recording what the press secretary is saying, especially when it can -- and should -- be streamed live online and archived.

What we do need multiple people in the room for is asking questions of the press secretary and not accepting vague answers or avoidance. I don't know enough about the environment in these kinds of press conferences to know how this would play out, but it seems like a shift of focus would be inevitable.

Likewise, most news nowadays comes in the form of stories that are designed to be informative, relatively entertaining (or at least not completely dry), and contain some context of the situation (if it is a part of a larger unfolding event). What could certainly happen, however, is a separation of the basic facts -- Obama said this during his state of the union speech; 7 people were killed in Iraq at this location -- from any sort of narrative tying them together. As a programmer, my hope would be that the facts are reported in a standard, open format which could be read by any application and mashed up in interesting ways, but I doubt something like that will happen.

This could ease the burden on journalists, who are now expected to do both of these roles; both collecting raw data and synthesizing it into a "story." Just like we don't need a multi-paragraph article each time the Dow changes (we can just look at the number directly), we don't necessarily need an article to make us aware of what was said at the State of the Union. Of course, most people are less interested in the exact value of the Dow or what was said in the SotU and are more interested in what the facts mean or imply. That is where I see the role of journalists and reporters thriving, in the making sense of large collections of data.

One can imagine that this lessens the value of what they have to say, but it forces one to wonder why the media have taken this route.

Some have said that the media (especially cable news) is responding so strongly to things like Twitter because they don't want to miss the boat like they did with blogging. There is definitely a sense of "we have to use this because it is cool, regardless of whether it is useful" with the cable news outlets, and reeks of trying to get on board with what the cool kids are doing.

There are probably other reasons as well, but I suspect that a lot of the push from the old-guard media outlets to use Twitter is due to this, especially if they are forcing people to do it, rather than letting them use Twitter because the journalist finds it to be a useful tool.

It is one thing to say the days of one article per day, all issues the same time on the morning paper are over. It is quite another to claim that the public demands a constant flow of near meaningless "news".

There was a part in the Shirky and Rosen video series (I forget where) where they were talking about the notion that the main "hard news" stories were never really for the general public. They are sort of positioned that way, but most people simply don't care about most of what is going on in the world on any kind of day-to-day basis. For example, the center story on the New York Times site is "Full of Tricks, White Dazzles in Superpipe," which is about a snowboarder. Now, I certainly don't care about that, and neither do most people, but it is there nonetheless. The real purpose of the front page of the newspaper, according to them, is occasionally present everyone with the few truly important stories: US goes to war; Lehman collapses, bringing down economy; etc.

Most people don't really care about what Obama said on one particular night, but do care about the general direction the country is moving. A newspaper comes in multiple sections, in this view, not so that a sports fan can stumble across an interesting article about climate regulation in Eastern Pennsylvania, but so that the sports fan can completely ignore everything except the sports section.

What people do anyway, and what they've always done, since the beginning of the notion of "public opinion" as something rulers cared about (which they discuss in the video; it is fascinating), is pick and choose only those things which they find interesting or applicable to their daily life, and only occasionally read about anything outside of that.

What a "flow-based" news ecosystem needs to achieve, then, is to allow people to filter out all of the news about which they are uninterested, but occasionally push to them stories about the few things outside their regular interests that are genuinely important: major corruption, war, major economic decisions, etc. This is essentially what news has always done, even if it has the pretense of providing everyone with a broad, daily summary of events across multiple areas of interest.

Saturday, January 2, 2010

Wearing SOCKS with Emacs

So apparently I have way too much time on my hands, so I went ahead and taught myself the amazingly cool bindat package of Emacs. It is basically a way of writing specifications for translating binary to alists (attribute lists) and back again. Once you have a specification written (and it is pretty easy once you get the hang of it), converting data back and forth is incredibly easy. I did run into a slight hitch with a couple things, though:
  1. Variable-length, null-terminated strings and
  2. Union structures
In an effort to make life easier for future Emacsians, I'll give the specs here, removing the need to figure it out yourself. But first, let me give a brief introduction to Bindat.

What is Bindat?

Let's say you have some binary data that you want Emacs to deal with. Rather than custom-writing a binary parsing function (because really, who wants to do that in 2010?), you can write a bindat specification and then use the bindat-pack and bindat-unpack functions to convert alists to and from binary, respectively. Let's say the binary format consists of:
  1. A byte indicating the version
  2. A byte for the length of a string
  3. Up to 255 bytes of a string, without a null byte at the end
Our spec would look like this:
(setq spec '((version byte)
             (length byte)
             (string str (length))))
Pretty self-explanatory, with the complication that the (length) gets filled with the integer value of the length field after it has been read. To convert data to binary, you would create an alist like this:
(setq data '((version . 5)
             (length . 11)
             (string . "example.com")))
And stuff them both into bindat-pack as follows:
(bindat-pack spec data)
Which will give you "\x05\x0Bexample.com", where "\x05" is a byte with value 5 and "\x0B" is the byte with value 11. As you can imagine, the function bindat-unpack does the reverse. Pretty cool! And pretty readable, too! If you aren't doing anything much more complex than this, there really isn't much else to learn (aside from the "u8", "u16", and "u32" types, which do what you would expect). The Elisp manual on this is pretty good, so check it out for more depth.

The hard part

So, as the title of the post suggests, I was using bindat to implement the SOCKS4 protocol in Emacs (yes I know it's already been done; I was curious!). One thing that SOCKS4 (and SOCKS4a) does is include a "user ID" in the request, so that the proxy server can perform (very very) basic access control, or something. The problem is that all of the datatypes in bindat expect an explicit length, but the only length indication SOCKS4 allows is the trailing null byte. There may well be a better solution to this (and if not, I should contribute one :), but I came up with a (very hackish) solution. It takes advantage of the fact that the "length" field can be an Elisp form, which is evaluated during packing and unpacking. So here is a specification for the SOCKS4a protocol in all its ugliness:
(setq socks4a-spec
      '((version byte)
        (cmd-stat byte)
        (port u16)
        (addr ip)
        (id strz (eval (cond
                        ((bindat-get-field struct 'id)
                         (1+ (length (bindat-get-field struct 'id))))
                        (t
                         (1+ (- (search "\0" bindat-raw :start2 bindat-idx) bindat-idx))
                         ))))
        (domain strz (eval (cond
                            ((bindat-get-field struct 'domain)
                             (1+ (length (bindat-get-field struct 'domain))))
                            (t
                             (1+ (- (or (search "\0" bindat-raw :start2 (1+ bindat-idx)) 100) bindat-idx))
                             ))))))
Bleah! That's not nearly as pretty and simple as the earlier one! The trick is that for each of the string fields (id and domain, both of which are null-terminated), we search ahead for a null byte and use the distance between the start of that field, bindat-idx, and the null byte. We add one to the length to make room for the null byte itself. This is only for unpacking (converting from a byte array to Elisp structures), though. When converting to a byte array, we can use the bindat-get-field function to get the value of the field we are encoding, and then take its length (again, adding one for the null byte).

Oh, the agony!

So yeah, that's what I spent my night doing. As a bonus, here are the specs for SOCKS5 (which was much easier, as it uses the vastly-superior Pascal convention for strings).
(setq socks5-greeting-spec
      '((version byte)
        (auth-count u8)
        (auth-list repeat (auth-count)
                   (method byte))))


(setq socks5-ehlo-spec
      '((version byte)
        (auth byte)))

(setq pstring-spec
      '((len u8)
        (str str (len))))


(setq socks5-conn-spec
      '((version byte)
        (cmd byte)
        (res byte)
        (addr-type byte)
        (union (addr-type)
               (1 (addr ip))
               (3 (struct pstring-spec))
               (4 (addr6 vec 16)))
        (port u16)))
The one thing that took a bit of figuring out was the union specification, as it isn't quite the same form as the others. Figuring exactly what got nested and by how much was a fun not at all fun experience, though is nice to know.

What now?

Having taken a brief look at the socks.el package that ships with Emacs, it looks like it could be simplified by using the bindat package to handle the network protocol stuff. I'll have to see about adding that in at some point. Also, making it easier to use variable-length null-terminated strings would make the whole experience much more pleasant, so I'll see about getting that included as well. In the meantime, happy hacking in 2010!