Tuesday, March 23, 2010

A Response to "Victimology bites"

This started out as a comment on Eric Raymond's "Victimology bites" blog post, but got long enough that I figured I'd turn it into its own blog post.

I think his attacks on liberals and colleges are probably a little overstated, but are are certainly not inaccurate. There have been plenty of self-proclaimed conservatives who have played the victim card: think of things like the complaints about the "war on Christmas/Christianity" or the "Real American, Main Street people" suffering at the hands of "Predator Lenders," or the "War on Marriage." While the second example especially might have some merit, Liberals and Liberal Democrats are not the only ones who do this.

Now, it is entirely reasonable to say that the current Republican party and some of those who call themselves "conservative" do not actually subscribe to a truly Conservative philosophy, in the traditional sense of the term, and are actually Liberals in their view of the role of how involved government should be in people's daily lives. I can certainly buy that.

While I don't doubt that there are those (particularly the powerful, who have something to gain) who are consciously pushing Victimology as a stealth power grab, he seems to implicitly suggest a level of malice that I believe is untrue. My take is that many people would say that forcing businesses to provide handicapped parking spaces at least comes from a place of compassion. I agree that it is possible (and easy) to take this too far, but I doubt the majority of people do it out of some kind of desire for a tyrannical takeover.

My (uninformed) opinion is that the current state of Victimology arose from an overreaction to the very real problems of discrimination, disadvantage, and structural barriers to success. There definitely has been a subversion of some of the structures designed to help those who help themselves; the problem is that it is often hard to tell who is interested in helping themself.

Saturday, March 20, 2010

yt-bulk-py: A good program with a bad name

So I just finished another exhausting Ballroom competition, and once again, there are a ton of videos to upload (133, to be exact). Since I found myself doing this more and more, I decided to make my life easier by some code to help the process out.

For all of you (yes, in Dan's secret magic world, there are people other than myself who read this blog) who don't know how a Ballroom competition works, let me give you a brief overview so you get what the video situation is like. There are a total of 19 dances divided into 4 sections (Rhythm, Latin, Smooth, and Standard) and spread across 5 levels. Each section is run sequentially, so all of Rhythm is finished before any Latin starts, and so on. At the beginning of a section, they run all of the preliminary rounds, from the Newcomers up to Championship (it's a bit more complicated than this, but it's not important for our purposes). So for example, you would run the first round of Newcomer Waltz, followed by Newcomer Tango, then Bronze Waltz, then Bronze Tango, and so on.

Each round has about 90 seconds of dancing, and because there are usually more people than fit on the floor at a time, there will be multiple heats for each round. A typical Silver couple (the middle-ish level where most of the people are) might compete in around 10 dances, with 3-5 rounds per dance for a total of 30-50 heats of 90 seconds just for that one couple. Multiply this by the number of dancers on the team, and you can easily reach over 9000 100 videos from even a small competition.

The issue then becomes, "what do you do with all of these videos?" Well, before Brown's web hosting system decided to crap out on me, I had this nifty faceted search system on the team website where you could drill down by category and find exactly the videos you wanted, without running afoul of YouTube's "helpful suggestions" of what you might want. The problem is then that nobody wants to tag and upload hundreds of videos, so you need a way to automate as much as possible.

What I have now is a multi-step system which removes as much of the manual tedium as possible, without developing advanced graphics algorithms to figure out the dances entirely by machine learning (if only).

Freedom from Tape


The camera we use is a surprisingly nifty 720p Panasonic MiniDV camera, whose only major flaw is that you have to remove the battery to charge it, plug the camera into AC, or even connect it to FireWire. Seriously, dumbest idea ever.

Anyway, the first step generally is to copy all of the raw video off the camera. This isn't so bad, since you can essentially set it and forget it, coming back every hour to pop in a new tape. I had about 4 hours of video from this past competition, so there was some quality waiting to be done. The nice thing is that essentially every DV capturing program at this point will automatically create separate files when the clip ends, so I usually don't have to scrub through the video and chop it up into little, 1 minute 30 second pieces.

Once this is done, the second, and still most time-consuming step takes place.

The Naming Ceremony


From here, there are a lot of nice little video files sitting on the hard drive with names like "video00001.avi", "video00010.avi", and who could forget, "video9999.avi". The task now is to make some sort of sense of these names, turning them from something like "video0019934_good_god_how_many_are_there.avi" into "Holy Cross 2010 Bronze American Cha Cha - Round 2, Heat 3.avi", which is only slightly more useful to humans. Until someone feels like writing me an image recognition algorithm that is able to determine the type of dance and the level of an arbitrary video, this part will always have to be done by hand. Luckily, I have Emacs, so my hands are fusion-powered nanobot swarms capable of forging entire worlds from nothingness.

So here is what the process looks like:


Whaaa? It's not actually that crazy once you see what's going on. The video on the right is, obviously, the video I'm tagging. On the upper half of the screen is my current progress on this very blog post, and on the bottom is the list of videos. You can see that some have nice names like "Holy Cross 2010 Newcomer International Rumba - Round 1, Heat 1.avi" and some are hideous like "Holy Cross 2010 Latin066.avi", which is pretty useless. The key is what is going on down in the lower-left corner, which I'll blow up for you:


So first, I select the section, in this case Newcomer, by typing the name of the section. The cool thing is that, thanks to Emacs' Ido-mode it narrows down the choices as I type, meaning that in reality, I only actually have to hit one key ("n" in this case), and press enter, and the bolded option will be selected. Much nicer than typing out the whole thing and worrying about mistakes, since it forces you to choose one of the available options.

From there, I select the dance. Again, only one keypress (and then enter) is required:


Then the round (it defaults to 1):


And finally the heat:


And it then names the file "Holy Cross 2010 Newcomer International Jive - Round 1, Heat 2.avi" and immediately starts playing the next file, and so on down the list.

That took me only 8 keypresses, including enter ("n", enter, "j", enter, "1", enter, "2", enter). At this point, the limiting factor is how quickly I can determine which dance it is, which takes only a few seconds (since the music gives most of it away).

So now I have 133 video files with nice, human-consumable names, how do I get them online with a minimum of fuss?

The Uploadening


For a while, I had used Google's Gears-based bulk uploader to upload the videos to YouTube, but this had two very important drawbacks. First, you had to manually specify a title and description for each video, which isn't so bad when you have 10 videos, but with 133, it gets old very quickly. This wouldn't be the end of the world if it weren't for the second snag, the disconnections.

I don't know if my computer was just feeling malicious or if Gears was haunted with an evil spirit, but I always found that the upload would mysteriously stop after a few videos, and Gears didn't much feel like resuming the upload, so I ended up having to close and re-open the page, which of course caused me to lose all of my hard-copied video titles and descriptions. Major no funsies.

The solution came in the form of a program I wrote called yt-bulk-py, for YouTube Bulk uploader in Python. Inspired name, I know. The idea was that uploading a big series of videos should be the easy part once you know what you want to call them. I ended up making a simple Python program which used the YouTube APIs to fully automate all of the uploading. The file name of the video would become the title, and I created a simple config file which filled in the description and tags (since they would be the same for each video).

The best part is that it is fully resumable, so if I have to take my laptop somewhere and stop uploading for a while, once I get back, I can just restart the program and pick up from where I left off without having to re-enter anything at all.

Oh, the Waiting!


This now means that what used to take hours of micromanaging agony now takes about an hour of dedicated tagging time (probably even less than that) and a few seconds of checkups to switch the tapes or start the upload process. Were the team website working, I would show off the searching hotness, but alas. The cool thing to note is that it automatically watches the YouTube feed for the Ballroom team's account and automatically processes any new videos it hasn't yet seen. Essentially, once the video is named, I can start off a process which sends it from my computer, to YouTube, to the Ballroom website, which then automagically processes the video's name (as in, it notices that the video has "Newcomer" in the title and files the video under "newcomers").

Ah, the simple joys of modern life.

Monday, March 1, 2010

Comparing Emacs Version-parsing Libraries

Recently, I've become interested in advancing the (currently sad) state of Emacs package management. It is a well-documented problem (see here for more detail), so I won't discuss it generally.

One important aspect of a package-management system is the ability to deal with the different versions of a package. Because of the current state of packaging, there is no real standard format for version numbers. Any potential package manager must be able to figure out the version of a given package without being told it explicitly by a user (otherwise maintaining the packages would be way too hard). Currently, package.el has a simple-ish way of dealing with version numbers: it uses the lisp-mnt library (which comes with Emacs) to pull out the Version or Package-Version headers, strips RCS info (some version numbers are like $Id: linkd.el,v 1.63 2007/05/19 00:16:17 dto Exp dto $, and we really only want 1.63), and then split the result by periods and convert the pieces to integers. This means that it only works for versions like 1.2.3 and not 1.2.3alpha.

It would be nice if we could get everyone to use only dotted-numeric version numbers, but that's not happening any time soon. Instead, a package manager must be able to make sense of more complex version numbers, such as 6.34a (which is what org-mode uses), 1.0pre7 or (cedet). I'm going to look at the following three version parsing solutions:

  • version-to-list, included in Gnu Emacs.
  • inversion, from CEDET, now included in Gnu Emacs (not sure which version).
  • vcomp Written by Jonas Bernoulli, creator of the Emacs Mirror

I'll take a bunch of examples and show the output that each one produces. If you have any suggestions for additional version formats, please let me know :)

  • "1.0pre7"
    version-to-list
    (1 0 -1 7)
    inversion-decode-version
    (prerelease 1 0 7)
    vcomp--intern
    nil

  • "1.0.7pre"
    version-to-list
    (1 0 7 -1)
    inversion-decode-version
    nil
    vcomp--intern
    nil

  • "6.34a"
    version-to-list
    (6 34 -3)
    inversion-decode-version
    nil
    vcomp--intern
    ((6 34) (104 0 96 0))

  • "1.3.7"
    version-to-list
    (1 3 7)
    inversion-decode-version
    (point 1 3 7)
    vcomp--intern
    ((1 3 7) (104 0 96 0))

  • "1.0alpha"
    version-to-list
    (1 0 -3)
    inversion-decode-version
    (alpha 1 0 1)
    vcomp--intern
    nil

  • "1.0PRE2"
    version-to-list
    (1 0 -1 2)
    inversion-decode-version
    (prerelease 1 0 2)
    vcomp--intern
    nil

  • "0.9alpha"
    version-to-list
    (0 9 -3)
    inversion-decode-version
    (alpha 0 9 1)
    vcomp--intern
    nil

  • "2009.04.01"
    version-to-list
    (2009 4 1)
    inversion-decode-version
    (point 2009 4 1)
    vcomp--intern
    ((2009 4 1) (104 0 96 0))

  • "2009.10.5"
    version-to-list
    (2009 10 5)
    inversion-decode-version
    (point 2009 10 5)
    vcomp--intern
    ((2009 10 5) (104 0 96 0))

  • "20091005"
    version-to-list
    (20091005)
    inversion-decode-version
    nil
    vcomp--intern
    ((20091005) (104 0 96 0))

  • "20091005pre"
    version-to-list
    (20091005 -1)
    inversion-decode-version
    nil
    vcomp--intern
    nil

  • "20091005alpha"
    version-to-list
    (20091005 -3)
    inversion-decode-version
    nil
    vcomp--intern
    nil

  • "20091005alpha2"
    version-to-list
    (20091005 -3 2)
    inversion-decode-version
    nil
    vcomp--intern
    nil


Looking at this, it seems like version-to-list is the way to go, as it handles the different possibilities better than any of the other functions.