Monday, September 13, 2010

Test of org-googlecl

I heard about this great project called Org-GoogleCL, which allows for publishing of blog posts to Google's Blogger from within Emacs' Org-Mode. This is a test of that library.

Hmm, there seem to be some issues with it not posting the full thing. I'll have to dive into the code at some point…

Edit: Oh, looks like it is working now. Still are some problems with filling paragraphs, though.

Wednesday, August 11, 2010

On the value of knowing the lower levels

While digging around some of the Python documentation (read: procrastinating), I stumbled across this section comparing Python performance to C in some very simple and contrived "benchmarks." The point of the section was that Python is not C, and you should be careful when trying to apply knowledge of C to Python. On a simple test of adding i to itself 10 million times, Python was about 50 times slower than C (which isn't really surprising, given that the C code just executes a series of single assembly addl instructions).

Thursday, July 29, 2010

On cluelessness and racial self-identification

This was a response to Eric Raymond's recent blog post about "the perils of ethnic identification" and why his racial identity isn't a big deal to him. It is an interesting article in its own right, but my response was prompted by a comment made by a "William O. B'Livion", in which he said:

So I was born to a couple of college kids at a midwest school, put up for adoption, raised by a father who’s parents were Balkan/Mediterranean immigrants, and mother who's parents were Mediterranean. Raised in a different Mid-west college melting pot (seriously melting pot. I was a teenager before I realized that other people judged someone by the color of their skin) and have NO cultural affinity other than "American".

(emphasis mine)

This reminded me of a similar event from my life so I wrote the following:

Glad I wasn't the only one blissfully unaware of widespread intolerance!

My dad's an African-American descendant of former slaves, though I don't know the story any further back than his parents; never bothered to ask. My mom's 3rd (or 4th? I dunno) generation Polish or Lithuanian Jew, and I was raised 99% secular atheist, 1% Quaker in the suburbs of Philadelphia. I was pretty ignorant of religious denominations, histories, and traditional rivalries outside of having a vague sense that Christianity and Judaism shared part of the bible. It wasn't until 9th grade, when I saw the movie "School Ties" that the idea of discrimination against Jews (outside of the Holocaust, obviously) ever occurred to me. From my perspective, Judaism and Christianity were 90% identical, and I was completely confused as to why the kids at the new school would care at all about a kid being Jewish.

Perhaps semi-relatedly, I have always considered it a point of pride that I'm not much of a "typical" anything. Through school, I wrestled, programmed, sang, acted, played soccer, and spent hours at a time playing computer games. I liked the fact that I wasn't the typical black kid, or Jew, or nerd, or gamer, or jock, or anything. It's the things I've done during my life which define me, not who or what my long-dead ancestors were or did.

Saturday, July 24, 2010

FroYo on the Droid, a Browser Benchmark Update

Yesterday I rooted my Droid to install the "FRG01" leaked version of Android 2.2 (FroYo) for my Droid. I figured I'd run the SunSpider browser benchmark again to see what (if anything) had changed as far as the browser went. I knew that there were supposed to be speed improvements in FroYo, both to the underlying Android system and also with the browser specifically, so I figured I'd put the fancy new JIT to the test.

If you remember from the last test the Droid got a total of 22264.3ms. With the newly-installed FroYo browser, that time dropped to 11520.6ms. Yes, the browser on the newest release is almost exactly twice as fast as it used to be. Behold the power of the JIT!

Here is an updated spreadsheet with the new Droid versus the old Droid browser results:

As you can see, Eclair is about half the speed (twice the time) of FroYo, and this is running on the exact same phone. I haven't overclocked the phone or anything, so this should be a pretty representative test. This also means that a Droid with FroYo does JavaScript at almost exactly the same speed as an iPad (at least when I tested the iPad). Not bad, little Droid!

Tuesday, April 13, 2010

The great smartphone/pad shootout!

While waiting for my Droid to be serviced at the Verizon store, I went down to the Apple store in the mall to check out the much-ballyhooed iPad. I'll save my full reactions for later (though it's not anything you haven't heard before) to bring you a comparison of browser speed between the iPad and Droid.

This is not a terribly scientific study, but it should give you a rough idea of who each device performs in a roughly average environment. I ran SunSpider 0.9.1 on each device, operating over wifi and with nothing really going on in the background (but also without any specific avoidance of background apps). I'm using a stock Droid at 2.1 with the free version of the Xscope browser. Other than that, everything is standard config, with whatever apps happen to be on the iPad or Droid.

Without further ado, here is the spreadsheet:

As you can see, the iPad is about twice as fast as the Droid for most tests, with the Droid pulling ahead on the single "nbody" test. *Cue long tirade about how nbody is the only test which actually matters and the Droid is therefore superior to the iPad in every way, especially at raw browsing speed ;)*

I was actually surprised that the Droid did as well as it did, considering that it is a couple months older and has a significantly less-powerful processor. Hooray for mobile competition!

Tuesday, March 23, 2010

A Response to "Victimology bites"

This started out as a comment on Eric Raymond's "Victimology bites" blog post, but got long enough that I figured I'd turn it into its own blog post.

I think his attacks on liberals and colleges are probably a little overstated, but are are certainly not inaccurate. There have been plenty of self-proclaimed conservatives who have played the victim card: think of things like the complaints about the "war on Christmas/Christianity" or the "Real American, Main Street people" suffering at the hands of "Predator Lenders," or the "War on Marriage." While the second example especially might have some merit, Liberals and Liberal Democrats are not the only ones who do this.

Now, it is entirely reasonable to say that the current Republican party and some of those who call themselves "conservative" do not actually subscribe to a truly Conservative philosophy, in the traditional sense of the term, and are actually Liberals in their view of the role of how involved government should be in people's daily lives. I can certainly buy that.

While I don't doubt that there are those (particularly the powerful, who have something to gain) who are consciously pushing Victimology as a stealth power grab, he seems to implicitly suggest a level of malice that I believe is untrue. My take is that many people would say that forcing businesses to provide handicapped parking spaces at least comes from a place of compassion. I agree that it is possible (and easy) to take this too far, but I doubt the majority of people do it out of some kind of desire for a tyrannical takeover.

My (uninformed) opinion is that the current state of Victimology arose from an overreaction to the very real problems of discrimination, disadvantage, and structural barriers to success. There definitely has been a subversion of some of the structures designed to help those who help themselves; the problem is that it is often hard to tell who is interested in helping themself.

Saturday, March 20, 2010

yt-bulk-py: A good program with a bad name

So I just finished another exhausting Ballroom competition, and once again, there are a ton of videos to upload (133, to be exact). Since I found myself doing this more and more, I decided to make my life easier by some code to help the process out.

For all of you (yes, in Dan's secret magic world, there are people other than myself who read this blog) who don't know how a Ballroom competition works, let me give you a brief overview so you get what the video situation is like. There are a total of 19 dances divided into 4 sections (Rhythm, Latin, Smooth, and Standard) and spread across 5 levels. Each section is run sequentially, so all of Rhythm is finished before any Latin starts, and so on. At the beginning of a section, they run all of the preliminary rounds, from the Newcomers up to Championship (it's a bit more complicated than this, but it's not important for our purposes). So for example, you would run the first round of Newcomer Waltz, followed by Newcomer Tango, then Bronze Waltz, then Bronze Tango, and so on.

Each round has about 90 seconds of dancing, and because there are usually more people than fit on the floor at a time, there will be multiple heats for each round. A typical Silver couple (the middle-ish level where most of the people are) might compete in around 10 dances, with 3-5 rounds per dance for a total of 30-50 heats of 90 seconds just for that one couple. Multiply this by the number of dancers on the team, and you can easily reach over 9000 100 videos from even a small competition.

The issue then becomes, "what do you do with all of these videos?" Well, before Brown's web hosting system decided to crap out on me, I had this nifty faceted search system on the team website where you could drill down by category and find exactly the videos you wanted, without running afoul of YouTube's "helpful suggestions" of what you might want. The problem is then that nobody wants to tag and upload hundreds of videos, so you need a way to automate as much as possible.

What I have now is a multi-step system which removes as much of the manual tedium as possible, without developing advanced graphics algorithms to figure out the dances entirely by machine learning (if only).

Freedom from Tape

The camera we use is a surprisingly nifty 720p Panasonic MiniDV camera, whose only major flaw is that you have to remove the battery to charge it, plug the camera into AC, or even connect it to FireWire. Seriously, dumbest idea ever.

Anyway, the first step generally is to copy all of the raw video off the camera. This isn't so bad, since you can essentially set it and forget it, coming back every hour to pop in a new tape. I had about 4 hours of video from this past competition, so there was some quality waiting to be done. The nice thing is that essentially every DV capturing program at this point will automatically create separate files when the clip ends, so I usually don't have to scrub through the video and chop it up into little, 1 minute 30 second pieces.

Once this is done, the second, and still most time-consuming step takes place.

The Naming Ceremony

From here, there are a lot of nice little video files sitting on the hard drive with names like "video00001.avi", "video00010.avi", and who could forget, "video9999.avi". The task now is to make some sort of sense of these names, turning them from something like "video0019934_good_god_how_many_are_there.avi" into "Holy Cross 2010 Bronze American Cha Cha - Round 2, Heat 3.avi", which is only slightly more useful to humans. Until someone feels like writing me an image recognition algorithm that is able to determine the type of dance and the level of an arbitrary video, this part will always have to be done by hand. Luckily, I have Emacs, so my hands are fusion-powered nanobot swarms capable of forging entire worlds from nothingness.

So here is what the process looks like:

Whaaa? It's not actually that crazy once you see what's going on. The video on the right is, obviously, the video I'm tagging. On the upper half of the screen is my current progress on this very blog post, and on the bottom is the list of videos. You can see that some have nice names like "Holy Cross 2010 Newcomer International Rumba - Round 1, Heat 1.avi" and some are hideous like "Holy Cross 2010 Latin066.avi", which is pretty useless. The key is what is going on down in the lower-left corner, which I'll blow up for you:

So first, I select the section, in this case Newcomer, by typing the name of the section. The cool thing is that, thanks to Emacs' Ido-mode it narrows down the choices as I type, meaning that in reality, I only actually have to hit one key ("n" in this case), and press enter, and the bolded option will be selected. Much nicer than typing out the whole thing and worrying about mistakes, since it forces you to choose one of the available options.

From there, I select the dance. Again, only one keypress (and then enter) is required:

Then the round (it defaults to 1):

And finally the heat:

And it then names the file "Holy Cross 2010 Newcomer International Jive - Round 1, Heat 2.avi" and immediately starts playing the next file, and so on down the list.

That took me only 8 keypresses, including enter ("n", enter, "j", enter, "1", enter, "2", enter). At this point, the limiting factor is how quickly I can determine which dance it is, which takes only a few seconds (since the music gives most of it away).

So now I have 133 video files with nice, human-consumable names, how do I get them online with a minimum of fuss?

The Uploadening

For a while, I had used Google's Gears-based bulk uploader to upload the videos to YouTube, but this had two very important drawbacks. First, you had to manually specify a title and description for each video, which isn't so bad when you have 10 videos, but with 133, it gets old very quickly. This wouldn't be the end of the world if it weren't for the second snag, the disconnections.

I don't know if my computer was just feeling malicious or if Gears was haunted with an evil spirit, but I always found that the upload would mysteriously stop after a few videos, and Gears didn't much feel like resuming the upload, so I ended up having to close and re-open the page, which of course caused me to lose all of my hard-copied video titles and descriptions. Major no funsies.

The solution came in the form of a program I wrote called yt-bulk-py, for YouTube Bulk uploader in Python. Inspired name, I know. The idea was that uploading a big series of videos should be the easy part once you know what you want to call them. I ended up making a simple Python program which used the YouTube APIs to fully automate all of the uploading. The file name of the video would become the title, and I created a simple config file which filled in the description and tags (since they would be the same for each video).

The best part is that it is fully resumable, so if I have to take my laptop somewhere and stop uploading for a while, once I get back, I can just restart the program and pick up from where I left off without having to re-enter anything at all.

Oh, the Waiting!

This now means that what used to take hours of micromanaging agony now takes about an hour of dedicated tagging time (probably even less than that) and a few seconds of checkups to switch the tapes or start the upload process. Were the team website working, I would show off the searching hotness, but alas. The cool thing to note is that it automatically watches the YouTube feed for the Ballroom team's account and automatically processes any new videos it hasn't yet seen. Essentially, once the video is named, I can start off a process which sends it from my computer, to YouTube, to the Ballroom website, which then automagically processes the video's name (as in, it notices that the video has "Newcomer" in the title and files the video under "newcomers").

Ah, the simple joys of modern life.

Monday, March 1, 2010

Comparing Emacs Version-parsing Libraries

Recently, I've become interested in advancing the (currently sad) state of Emacs package management. It is a well-documented problem (see here for more detail), so I won't discuss it generally.

One important aspect of a package-management system is the ability to deal with the different versions of a package. Because of the current state of packaging, there is no real standard format for version numbers. Any potential package manager must be able to figure out the version of a given package without being told it explicitly by a user (otherwise maintaining the packages would be way too hard). Currently, package.el has a simple-ish way of dealing with version numbers: it uses the lisp-mnt library (which comes with Emacs) to pull out the Version or Package-Version headers, strips RCS info (some version numbers are like $Id: linkd.el,v 1.63 2007/05/19 00:16:17 dto Exp dto $, and we really only want 1.63), and then split the result by periods and convert the pieces to integers. This means that it only works for versions like 1.2.3 and not 1.2.3alpha.

It would be nice if we could get everyone to use only dotted-numeric version numbers, but that's not happening any time soon. Instead, a package manager must be able to make sense of more complex version numbers, such as 6.34a (which is what org-mode uses), 1.0pre7 or (cedet). I'm going to look at the following three version parsing solutions:

  • version-to-list, included in Gnu Emacs.
  • inversion, from CEDET, now included in Gnu Emacs (not sure which version).
  • vcomp Written by Jonas Bernoulli, creator of the Emacs Mirror

I'll take a bunch of examples and show the output that each one produces. If you have any suggestions for additional version formats, please let me know :)

  • "1.0pre7"
    (1 0 -1 7)
    (prerelease 1 0 7)

  • "1.0.7pre"
    (1 0 7 -1)

  • "6.34a"
    (6 34 -3)
    ((6 34) (104 0 96 0))

  • "1.3.7"
    (1 3 7)
    (point 1 3 7)
    ((1 3 7) (104 0 96 0))

  • "1.0alpha"
    (1 0 -3)
    (alpha 1 0 1)

  • "1.0PRE2"
    (1 0 -1 2)
    (prerelease 1 0 2)

  • "0.9alpha"
    (0 9 -3)
    (alpha 0 9 1)

  • "2009.04.01"
    (2009 4 1)
    (point 2009 4 1)
    ((2009 4 1) (104 0 96 0))

  • "2009.10.5"
    (2009 10 5)
    (point 2009 10 5)
    ((2009 10 5) (104 0 96 0))

  • "20091005"
    ((20091005) (104 0 96 0))

  • "20091005pre"
    (20091005 -1)

  • "20091005alpha"
    (20091005 -3)

  • "20091005alpha2"
    (20091005 -3 2)

Looking at this, it seems like version-to-list is the way to go, as it handles the different possibilities better than any of the other functions.

Thursday, February 11, 2010

Letter to the US Trade Representative

From this Ars Technica article, the US Trade Representative, the body in charge of negotiating the Anti-Counterfeiting Trade Agreement (ACTA), has a section open for comments about the treaty. Let them know that you favor an open Internet and balanced copyright laws. I've copied my comment to them (which was too long to fit in the form, blah) here.

Letter to the USTR

I am very concerned with not only what leaked content I have been able to find about the Anti-Counterfeiting Trade Agreement (ACTA), but also with the lack of public involvement and visibility of this far-reaching and dangerous piece of legislation. For something that has a profound potential to damage the way the Internet, free speech, and the development of culture work, it is unacceptable that the only interested parties who have been allowed into the negotiations are those with the greatest interest in restricting new outgrowths of culture that the Internet can provide.

The companies that make up the Motion Picture Association of America (MPAA), Recording Industry Association of America (RIAA), the Business Software Alliance (BSA), the Association of American Publishers (AAP), and others currently have far too large of a sway over the decisions involved in the treaty, and represent a very lopsided view of copyright and patent laws. Despite their widely-publicized claims of the massive damages caused by piracy, there is little actual evidence in support of, and a great deal of evidence against their conclusions. The movie industry, especially, has been posting record profits, so the claim that innovation and creativity will be destroyed by Internet-based piracy is clearly false.

These industries' support of stricter copyright and patent laws is little more than an attempt to persuade the government, Internet service providers, or website operators to foot the bill to support their own business model. After having realized that directly suing their own customers was prohibitively expensive and generated a huge amount of bad will, these companies decided that they could pass the unpopular job of attacking music and movie consumers to someone else.

It is not the US government's job to support the business model of these few companies. The small, if any, increase in economic activity that might result from stricter laws would not nearly be worth the chilling effects on cultural expression, free speech, and innovation. Many of the companies most heavily involved in the ACTA negotiations are products of an industrial age in which reproducing and transporting information is a difficult task and requires a large, organized infrastructure in order to be feasible. With an increasing number of homes now having high-speed Internet access, this difficult problem of transporting information no longer exists. As such, a large component of the reason for these companies to exist has disappeared as well, leading to an existential crisis for them. These companies are the candle makers and buggy-whip manufacturers of the 21st century, and rather than allowing them to continue to seek economic rent from the former audience turned participants in culture who no longer need them. Even if the claimed "worst case" does come to pass, and these large media companies do fold under the weight of Internet piracy, they will not be missed, as their once-useful purpose has been superseded.

It is extremely important for the future of Democracy, freedom of speech and expression, and the continued development of our culture that the negotiations be opened to the public, the full and current version of ACTA be made available, and those companies whose interests run counter to those of the country at large not be given undue influence over one of the most important treaties of our time.

Monday, February 8, 2010

Copyright: You know the drill

This is a reply to my father's reply to my earlier post about copyright. Here, for your viewing pleasure, is my full response.

Let me first just say that I think we agree much more than we think we do, but the verbiage that we are using construes our positions as being more antagonistic than they actually are.

Well, I cannot accept the premise. If one does not own one's intellectual property, then there is a major disincentive to creating it.

Theoretically, but how does this play out in practice? There is a certain amount I would need to expect to be able to earn off of something in order for me to prefer that activity to some other form of employment. One way of trying to hit that minimum of income would be to grant me exclusive use of the thing that I have created, and allow me to sell it. This is the strategy behind copyright. On this point, we (and essentially everyone else) agree.

There are various other methods of inciting me to do the work, such as patronage, donations, and grants. Many of these have worked in the past (and continue to work) for certain kinds of works (music comes to mind), but do not appear to be general solutions to the problem of inciting people to produce the kinds of works that we as a society are interested in.

The issue that most people have with copyright is what happens when the incentive to create a particular work rises well above what its value is to society? If I am to write a program, and it will cost me $50,000 worth of opportunity cost to do so, what happens when I am paid $500,000 for it? Clearly, if I was willing to write the program for $50,000, then someone is over-investing and is wasting money. The problem that people have is that the costs imposed by the current copyright laws are higher than what is required to incite the works produced.

There is also the added "cultural morality" notion that applies mainly to things like movies, music, and books, that says that works that are old enough become a part of the culture, and claiming ownership of them is unethical. A prime example of this is the fact that the "Happy Birthday Song" is still under copyright, even though it could be considered a basic part of our culture at this point. I find this argument reasonably convincing, but believe that a strong case can be made against the current scope and duration of copyright protection even without invoking matters of morality.

I do not read radiological studies for free. If I could not charge for it, I would not do it. I would (have to) find another line of work.

I agree. Nobody (well, except for the VERY fringe people) is suggesting an abolition of compensation for intellectual endeavors; they are merely saying that past a certain point, extremely long copyright terms lead to a kind of rent-seeking behavior.

An analogy to radiology would be that you wouldn't expect someone to continue paying you for the rest of their life for use of their brain after you had performed a radiological examination. They will pay you enough to cause you to perform the exam instead of seeking another line of work, but once you have performed the service, the payment relationship ends. This analogy is by no means perfect, since a third party would have no use for a diagnosis made of someone else, but the general idea applies.

So performing intellectual work, as a job rather than a hobby, must include the ability to be compensated for it.

Yes. Absolutely. I completely agree with this, and as time goes on, this fact will only become more important in an increasingly abstract-creation-based economy. The challenge is determining how much to compensate and for how long someone holds a monopoly over their creation.

The notion underlying this analysis is that new intellectual work is of such limited importance that all increments in units are of constant value. Hence formulations such as
"the term of a copyright must not be one second longer than is required to induce a creator to produce new works"

I disagree with your assessment of that statement. I believe that a copyright term should be as short as possible so that the creation can be re-integrated into the common pool of knowledge and reused without transactional friction by as many people as possible.

What I did not make sufficiently clear in that quote is what kind of "new works" to which I am referring. An ideal copyright system would assign a specific level of compensation to each individual work which would exactly match the amount of money needed by the creator to incite her to work on that creation rather than some other endeavor. Somehow, society would determine which new works were worth supporting, and grant monopoly privileges only to those which were worthwhile, or at least only subsidize an activity up to its value to the whole society.

Obviously, no system will ever be able to perfectly predict what amount of compensation is needed for each individual creator, nor will it be able to accurately predict ahead of time what the value of a new work is to society, but this is the ideal toward which we are aiming. The job of determining the value of a work to society is left to the market, which is (like anything) imperfect, but the best mechanism we currently have of determining value to a wide group. The other factor, the amount that the creator needs to prefer working on her creation rather than something else is currently solved by the promise of exclusive control over that work for 70 years after the creator's death. My argument is that life + 70 years is drastically over-estimating the compensation required to incite the creator to spend her time on the project.

I am not at all interested in the minimum incentive that will get one more copyrightable one-column post on a newspaper site. The sort of thing that could be thrown off in the time it takes to type may get copyright protection, but does not need it. No one cares whether it is published, and no one has any reason to use it at all.

I agree. One of the hardest problems is that the compensation required to incite a one-column newspaper article and the development of a distributed, soft-real-time database are totally different, yet the laws to support both of them are essentially the same.

I am very interested in the incentives that would lead to important and substantive contributions. These may require sustained long term effort by people who are able to provide this only because they derive their income from this work.

Absolutely. One thing to consider, though is that none of the work that these people are doing occurs in a vacuum. Essentially everything that is done today (and this trend is only accelerating with the increased ubiquity of the Internet and accessible programming tools) is built off of the work of people who came before. If the cost of integrating and building off of existing work is lower, then each person can do more with less. The ubiquitous availability of blogging software has meant that anyone who wants to blog no longer has to be a programmer, leading to more time spent blogging and less time spent rewriting blogging programs.

So I would not equate an improved design for a hybrid engine, which might save billions of dollars in transportation costs with some rant posted on the FoxNews website.


If one takes the "not one second longer" criterion, and the structure of the legal system requires that the duration be the same for all, then you would wind up with zero for the duration of protection.

This is where the heart of the problem lies. My position is that the copyright duration for new hybrid engines should be "not one second longer" than is required for the new hybrid engine to be produced, while the copyright duration for a rant should be "not one second longer" than is required to incite the production of the rant. Essentially, I want to do away with the "and the structure of the legal system requires that the duration be the same for all." This is not an easy problem to solve! However, I think that the current term is far longer than is needed for even the most involved of production.

People will produce trash for free, and a huge amount of it. If you make the standard the minimum required to produce that, then this will drown the much smaller number of truly productive accomplishments.

Yes, with the caveat that numerically speaking, there will always be much more trash than useful stuff, but you always have the luxury of ignoring the trash. This is not a dismissal of the concern of the loss of quality, involved creations, but a side-note against the distress that people have when they see a lot of junk. For instance, a common criticism of Twitter, Facebook, and blogs in general is that "most of the stuff there is crap." Yes, this is absolutely true, but especially in the Internet age, it is trivial to completely ignore the crap and only look at the worthwhile content. What would be a problem is if a great deal of crap is produced to the detriment of the production of worthwhile content.

If you cannot protect your creations legally, then you will be forced to hide them. Instead of patenting your hybrid design, and selling it. Making the servicing and repair widely available would benefit society. The alternative is to put the engine in a sealed compartment, and require them to shipped to the factory for all service. You would have to build it so that unauthorized attempts to open the unit would destroy it. That way no one can buy one, take it apart and copy it, since this would be legal. Of course, the benefit to society of the more efficient engine would be reduced by the costs of replacing them, shipping them back to the manufacturer, etc. You could get to this situation although you still had huge numbers of short articles published online, so you can induce A creator to produce ONE more work, in her spare time, for free.

Buried in there is an argument for more open sharing of information. You mention that "making the servicing and repair widely available would benefit society." This is the key balance to find. You want there to be enough of an incentive for the manufacturer to produce the engine, but it is also more efficient if anyone can service the engine without needing special permission. Obviously, there is the matter of being qualified to service the engine, but this is a matter of certifying someone's expertise, and not of restricting the knowledge or use of the knowledge.

Also, you are making a "benefit to society" argument here, and the only area in which we differ is specifically how to tweak things to achieve that maximum benefit. Again, our positions are not as different as they seem.

This is also more of a patent issue than a copyright one, and while the two issues are similar (and derive from the same section in the Constitution), there are key differences which make it dangerous to conflate the two.

If you cannot keep something as a trade secret, and there is no legal impediment for others to copy them, then it has no value to the creator. Even though it may have huge value to society. So no one will do it.

I wouldn't say that it has _no_ value to the creator. There have been numerous businesses (including my beloved Open Source companies) which are able to find value in giving things away for free. An (admittedly contrived) example would be that the engine manufacturer gives away the designs for the engines for free, without copyright protection, but by employing the engineers who designed the engine, would be in the best position to service the engine when it needed work. In that case, the engines would essentially be loss-leaders for the repair service.

Now, this is not likely to work for the auto industry specifically, but this is how a great many Open Source companies (including Red Hat and IBM) sustain themselves. The software is essentially given away for free, and the money is made off of customizing and managing that software installation.

Consider the tragedy of the commons. Say you would grow much more grass and hay in the pasture if someone were to fertilize, water and rotate crops on the land. But you insist that the only claim on the result is the minimum that will incent anyone to do anything. One may well say "once my cattle are fed I will NOT set the field on fire. Therefore there is currently enough incentive to generate the lowest possible level of socially useful activity. Therefore there should be no further payment to the person who will care for the field." Of course under these circumstances, no one will fertilize or water, society will be worse off, but the "not one second longer" standard would be upheld.

My position is not that the duration should be set to "generate the lowest possible level of socially useful activity," but to generate the highest. Going in the opposite extreme, I could say "in order to feed my cattle, I will maintain exclusive use of the fields for one year, extendable to 70 years after my death." This might benefit me (assuming I took well enough care of the fields that my cattle could survive), but would be under-utilizing the common resource of the field. I do not need exclusive use of the entire field in order to raise my cattle, and by preventing other people from using it, I am hurting everyone else for little to no benefit to myself.

This does not begin to address the moral element. We do not have laws against stealing only because an empirical analysis of the economy suggests that those with laws against stealing grow faster than those without. I suspect this is true, but it is possible that societies with fewer laws, or lax enforcement, grow rapidly. We have laws against stealing because stealing is wrong. We would still have laws against stealing, even if one could show that stolen money has higher velocity and provides more fiscal stimulus. At least I hope so.

This is a very different argument, and I hope that it does not get conflated with the other points I have made, because I believe they are able to stand on their own without invoking the moral aspect.

That said, I would say that, even though it is hard to imagine, if a society with ubiquitous stealing actually was better for everyone than one without, I would certainly prefer the former. Again, it is difficult to even conceive of a situation in which this might be true, but if it were, I would choose the option which left me better off. If I were somehow better off being able to freely steal from others, with the knowledge that they could freely steal from me, then I would choose that system. I don't care so much about the particular items and property which I own, but what I am able to do with them. If I can do more with the same amount of property by letting others "steal" it from me, then that would be the obvious choice, to me.

In the case of digital music, one could certainly argue that this has already happened. The rewards to issuing a new CD have fallen. Although the cost of simply recording music and burning it onto a CD also has fallen, there is simply less reason to do it.

I'm not actually sure about this. It is hard to get data on this, but one large (possibly even primary) reason for the decline in CD sales (aside from the shift to online music) is that most of the people who are interested in purchasing back-catalog music on CDs have already done so. When they first came out, there was a windfall of people repurchasing the old music they owned on cassettes or LPs on CDs, which boosted the sales numbers. This was clearly unsustainable, as eventually people would have purchased all of their old music on CDs, and would only purchase new music from that point forward.

The record industry has not clearly shown statistics of the relative sales of new releases versus back catalog music over the years, but the general attitude among those who follow this more closely is that the decline in CD sales is mostly attributed to the end of the back-catalog windfall, rather than piracy.

One outcome, which I contend is already underway, is for musical performers to release fewer CD's, and leave fans to the much lower quality of bootleg recordings from live events. If you cannot be paid for creating a studio performance, why do it?

Well, one of the main reasons for artists creating fewer CDs is that the current state of recording industry deals means that artists stand to make vastly more money off of live shows than from recorded music. An artist makes about 7x as much money from live shows as they do from recorded music , and that ratio is increasing.

Hence the greater focus of musicians on live touring and much less on recorded music. As time goes by, the rational performer would never create a recording of a full song. They might record the refrain and use it for marketing, but a full CD? Are you crazy? The mere existence of such a recording would be equivalent to giving it away.

Yes, that does seem to be where things are headed, though this is largely the result of the recording companies taking 90% of the revenue of recorded music, leaving 10% to the artists, versus the 90% of the revenue which goes to the artists from live shows.

So if you want to hear this performer, you had better show up in person.

I am personally kind of bummed by this, since I'm not all that interested in live shows.

Now I am not clear on why society is better off as a result of this, and it is clear that vigorous enforcement of laws against stealing would have prevented this from occurring.

Not really, and actually the opposite seems to be true. It is primarily the recording companies who are seeking increasingly strict copyright enforcement, whereas artists seem to do better with less enforcement (which allows their recorded music to be more easily promoted, serving as an advertisement for their live shows).

Fortunately for me, I am not looking for anyone to create new music. Essentially all the music I am interested in hearing was composed long ago. And once there is one recording available for legal purchase, I can buy it and listen to my heart's content. If I were a fan of contemporary music this would be a disaster.

I do worry somewhat about a shift away from recorded music, though that doesn't seem to be happening (since obscurity is a far greater threat to artists than piracy).

And all because of a de facto legalization of music stealing.

No, more like a set of increasingly artist-unfriendly contracts for recorded music, coupled with an increasing interest in live shows (which I don't get, but still).

These arguments also do not address the "clean up the streets" argument. If you lock up those whose respect for property rights is so weak that they feel no impediment to stealing music then you leave a more honest set of people walking around free. If you encourage stealing, then those who steal will prosper, and you will have a less honest society.

This is back to the separate moral argument, but I'm not that convinced by the "clean the streets" argument. To me, it seems a bit like (a more moderate version of) declaring something common and generally accepted as illegal, and then using that as an excuse to lock everyone up. Imagine if you made "saying mean things about the president" illegal, as is the case in Indonesia. You could then say "we need to clean up the streets of all those people who have so little respect for the president that they feel no impediment to saying mean things about him." Just because something is illegal, doesn't mean it's wrong. It also doesn't mean that every instance of law-breaking is an act of protest in the form of civil disobedience.

I could not care less about what a bunch of ignorant, racist, sexist, alcoholic, landed gentry who never did a lick of work in their lives (the framers of the Constitution) thought about the purpose of copyright. Not a one of them could have created anything of value. The idea of doing useful work was so alien to them that I am sure someone had to be brought in to explain the concept. Work was for servants. Laws were to protect the interests of the aristocracy.


Copyright in Context

This is my response to this Ars Technica article.

If you look at all of the foundational arguments for copyright, they are about a benefit to society, while only secondarily being about the authors or creators. Since the beginning, copyright, or a monopoly on ideas, has been regarded as an acceptable evil in the pursuit of the creation of new works. Copyright, like any enforcement strategy in a social-contract-based form of government, is a deal between the governors and the people which is only acceptable if the restrictions on personal autonomy are outweighed by the benefits that the restrictions bring.

Currently, there is little thought given to whether a copyright term that extends 70 years after the author's death actually makes the author more likely to produce a new book. Whether or not "Steamboat Willy" is still restricted by copyright has little bearing on whether Walt Disney produces more cartoons, because he is dead. According to the constitution and every western (possibly others as well) system of copyright ever, the term of a copyright must not be one second longer than is required to induce a creator to produce new works. The author's "property right" has never been a basis of copyright law and was even explicitly rejected by the Supreme Court of the US in 1985.

The problem nowadays is that wealthy corporate interests (not to beat the "rah, corporations r teh evil!!!11" drum too much) have a large vested interest in extending the copyright term as long as possible. Since their primary motive is profit, they don't care whether the money they make is on new creations or the back catalogue; they are perfectly content to resell "Snow White" over and over indefinitely, regardless of whether doing so encourages the production of new works. Indeed, a compelling argument could be made that continuing to sell 50+ year-old works actually decreases the incentive to produce new works, as it crowds out any new content and reduces the money available to finance new creations.

Friday, February 5, 2010

Just to make your lives difficult...

I finally bit the bullet and renamed my GitHubaccount from "dhax" to "haxney". I did
this to increase the consistency of my online names, so hopefully things will be less confusing overall.

Unfortunately, this means that all existing URLs to my old "dhax" repositories are broken, and GitHub unfortunately doesn't offer redirection for changed usernames. Though, to be fair, the Git and SSH protocols probably don't support any fancy redirection anyway, so there isn't much they could do. Plus, just offering the ability to change an account name while still preserving all the links and info is noteworthy in and of itself, so they should be commended for that. Yay GitHub!!

Anyway, if you find any place that has a broken link to my old "dhax" account, please let me know so I can fix it.

I figured I would do this now, before anything of mine gains much in the way of popularity so that in the unlikely event that anything should gain some degree of usefulness, I will have avoided a much more painful switch later. We'll see if it pays off.

Saturday, January 30, 2010

News Flows

Just like old times, I'm turning an email reply into a full-fledged blog post.

This is in response to an email my dad sent me, which I have quoted inline.

You can probably find the full text.

I didn't bother. There are enough other interesting things to read out there that I'm not going to spend the time (let alone money) to jump through hoops to read the full version of this.

The article is mainly about how the Obama administration relates to the press and tries to control the stories published about it. But buried in there are some interesting comments about the process. They claim that reporters now must file numerous daily reports- blog posts, tweets, updates for the paper or network website, do on air interviews for their site, or networks... They are so busy doing all of this that they do not have time for reporting. They cannot discuss issues with other people at the White House, or call up experts elsewhere for feedback. Instead, they just print whatever someone at the White House said, and are thankful they managed to get a quote in time to file their report.

I think this is indicative of a larger shift in the news ecosystem. My prediction/idea is that there will be a split between the people reporting on the basic facts of the situation -- who said what when -- and those writing interesting narratives that tie things together.

Until now, each newspaper or TV station had to have its own person physically present in the room at a press conference in order to write down what was said and then type it up or say it on the air. With the ease of duplicating information, we don't need 50 people to write down the exact same thing. We probably want more than one person to be doing this to avoid mistakes and such, but there is no need for every national news outlet to have someone recording what the press secretary is saying, especially when it can -- and should -- be streamed live online and archived.

What we do need multiple people in the room for is asking questions of the press secretary and not accepting vague answers or avoidance. I don't know enough about the environment in these kinds of press conferences to know how this would play out, but it seems like a shift of focus would be inevitable.

Likewise, most news nowadays comes in the form of stories that are designed to be informative, relatively entertaining (or at least not completely dry), and contain some context of the situation (if it is a part of a larger unfolding event). What could certainly happen, however, is a separation of the basic facts -- Obama said this during his state of the union speech; 7 people were killed in Iraq at this location -- from any sort of narrative tying them together. As a programmer, my hope would be that the facts are reported in a standard, open format which could be read by any application and mashed up in interesting ways, but I doubt something like that will happen.

This could ease the burden on journalists, who are now expected to do both of these roles; both collecting raw data and synthesizing it into a "story." Just like we don't need a multi-paragraph article each time the Dow changes (we can just look at the number directly), we don't necessarily need an article to make us aware of what was said at the State of the Union. Of course, most people are less interested in the exact value of the Dow or what was said in the SotU and are more interested in what the facts mean or imply. That is where I see the role of journalists and reporters thriving, in the making sense of large collections of data.

One can imagine that this lessens the value of what they have to say, but it forces one to wonder why the media have taken this route.

Some have said that the media (especially cable news) is responding so strongly to things like Twitter because they don't want to miss the boat like they did with blogging. There is definitely a sense of "we have to use this because it is cool, regardless of whether it is useful" with the cable news outlets, and reeks of trying to get on board with what the cool kids are doing.

There are probably other reasons as well, but I suspect that a lot of the push from the old-guard media outlets to use Twitter is due to this, especially if they are forcing people to do it, rather than letting them use Twitter because the journalist finds it to be a useful tool.

It is one thing to say the days of one article per day, all issues the same time on the morning paper are over. It is quite another to claim that the public demands a constant flow of near meaningless "news".

There was a part in the Shirky and Rosen video series (I forget where) where they were talking about the notion that the main "hard news" stories were never really for the general public. They are sort of positioned that way, but most people simply don't care about most of what is going on in the world on any kind of day-to-day basis. For example, the center story on the New York Times site is "Full of Tricks, White Dazzles in Superpipe," which is about a snowboarder. Now, I certainly don't care about that, and neither do most people, but it is there nonetheless. The real purpose of the front page of the newspaper, according to them, is occasionally present everyone with the few truly important stories: US goes to war; Lehman collapses, bringing down economy; etc.

Most people don't really care about what Obama said on one particular night, but do care about the general direction the country is moving. A newspaper comes in multiple sections, in this view, not so that a sports fan can stumble across an interesting article about climate regulation in Eastern Pennsylvania, but so that the sports fan can completely ignore everything except the sports section.

What people do anyway, and what they've always done, since the beginning of the notion of "public opinion" as something rulers cared about (which they discuss in the video; it is fascinating), is pick and choose only those things which they find interesting or applicable to their daily life, and only occasionally read about anything outside of that.

What a "flow-based" news ecosystem needs to achieve, then, is to allow people to filter out all of the news about which they are uninterested, but occasionally push to them stories about the few things outside their regular interests that are genuinely important: major corruption, war, major economic decisions, etc. This is essentially what news has always done, even if it has the pretense of providing everyone with a broad, daily summary of events across multiple areas of interest.

Saturday, January 2, 2010

Wearing SOCKS with Emacs

So apparently I have way too much time on my hands, so I went ahead and taught myself the amazingly cool bindat package of Emacs. It is basically a way of writing specifications for translating binary to alists (attribute lists) and back again. Once you have a specification written (and it is pretty easy once you get the hang of it), converting data back and forth is incredibly easy. I did run into a slight hitch with a couple things, though:
  1. Variable-length, null-terminated strings and
  2. Union structures
In an effort to make life easier for future Emacsians, I'll give the specs here, removing the need to figure it out yourself. But first, let me give a brief introduction to Bindat.

What is Bindat?

Let's say you have some binary data that you want Emacs to deal with. Rather than custom-writing a binary parsing function (because really, who wants to do that in 2010?), you can write a bindat specification and then use the bindat-pack and bindat-unpack functions to convert alists to and from binary, respectively. Let's say the binary format consists of:
  1. A byte indicating the version
  2. A byte for the length of a string
  3. Up to 255 bytes of a string, without a null byte at the end
Our spec would look like this:
(setq spec '((version byte)
             (length byte)
             (string str (length))))
Pretty self-explanatory, with the complication that the (length) gets filled with the integer value of the length field after it has been read. To convert data to binary, you would create an alist like this:
(setq data '((version . 5)
             (length . 11)
             (string . "")))
And stuff them both into bindat-pack as follows:
(bindat-pack spec data)
Which will give you "\x05\", where "\x05" is a byte with value 5 and "\x0B" is the byte with value 11. As you can imagine, the function bindat-unpack does the reverse. Pretty cool! And pretty readable, too! If you aren't doing anything much more complex than this, there really isn't much else to learn (aside from the "u8", "u16", and "u32" types, which do what you would expect). The Elisp manual on this is pretty good, so check it out for more depth.

The hard part

So, as the title of the post suggests, I was using bindat to implement the SOCKS4 protocol in Emacs (yes I know it's already been done; I was curious!). One thing that SOCKS4 (and SOCKS4a) does is include a "user ID" in the request, so that the proxy server can perform (very very) basic access control, or something. The problem is that all of the datatypes in bindat expect an explicit length, but the only length indication SOCKS4 allows is the trailing null byte. There may well be a better solution to this (and if not, I should contribute one :), but I came up with a (very hackish) solution. It takes advantage of the fact that the "length" field can be an Elisp form, which is evaluated during packing and unpacking. So here is a specification for the SOCKS4a protocol in all its ugliness:
(setq socks4a-spec
      '((version byte)
        (cmd-stat byte)
        (port u16)
        (addr ip)
        (id strz (eval (cond
                        ((bindat-get-field struct 'id)
                         (1+ (length (bindat-get-field struct 'id))))
                         (1+ (- (search "\0" bindat-raw :start2 bindat-idx) bindat-idx))
        (domain strz (eval (cond
                            ((bindat-get-field struct 'domain)
                             (1+ (length (bindat-get-field struct 'domain))))
                             (1+ (- (or (search "\0" bindat-raw :start2 (1+ bindat-idx)) 100) bindat-idx))
Bleah! That's not nearly as pretty and simple as the earlier one! The trick is that for each of the string fields (id and domain, both of which are null-terminated), we search ahead for a null byte and use the distance between the start of that field, bindat-idx, and the null byte. We add one to the length to make room for the null byte itself. This is only for unpacking (converting from a byte array to Elisp structures), though. When converting to a byte array, we can use the bindat-get-field function to get the value of the field we are encoding, and then take its length (again, adding one for the null byte).

Oh, the agony!

So yeah, that's what I spent my night doing. As a bonus, here are the specs for SOCKS5 (which was much easier, as it uses the vastly-superior Pascal convention for strings).
(setq socks5-greeting-spec
      '((version byte)
        (auth-count u8)
        (auth-list repeat (auth-count)
                   (method byte))))

(setq socks5-ehlo-spec
      '((version byte)
        (auth byte)))

(setq pstring-spec
      '((len u8)
        (str str (len))))

(setq socks5-conn-spec
      '((version byte)
        (cmd byte)
        (res byte)
        (addr-type byte)
        (union (addr-type)
               (1 (addr ip))
               (3 (struct pstring-spec))
               (4 (addr6 vec 16)))
        (port u16)))
The one thing that took a bit of figuring out was the union specification, as it isn't quite the same form as the others. Figuring exactly what got nested and by how much was a fun not at all fun experience, though is nice to know.

What now?

Having taken a brief look at the socks.el package that ships with Emacs, it looks like it could be simplified by using the bindat package to handle the network protocol stuff. I'll have to see about adding that in at some point. Also, making it easier to use variable-length null-terminated strings would make the whole experience much more pleasant, so I'll see about getting that included as well. In the meantime, happy hacking in 2010!