Thursday, December 31, 2009

Chomping in Emacs

While working on Ezbl, I came to the terrifying realization that Emacs doesn't have a chomp-like function to strip leading and trailing whitespace from a string. After some searching, I found a solution, but it was kind of ugly (specifying the whitespace characters exactly rather than using a character class), so I modified it a bit. Here is my result:
(defun chomp (str)
  "Chomp leading and tailing whitespace from STR.

Why doesn't Emacs have this built in?"
  (let ((s (if (symbolp str) (symbol-name str) str)))
    (save-excursion
      ;; Make the [:space:] class match newline.
      (with-syntax-table (copy-syntax-table)
        (modify-syntax-entry ?\n " ")
        (string-match "^[[:space:]]*\\(.*?\\)[[:space:]]*$" s)
        (match-string 1 s)))))
The magic is all in the regular expression, which eats up as much whitespace as possible from the beginning and end and returns whatever is left in between (because of the non-greedy "*?" operator). By default (or in the mode I was using), the newline character is not considered part of the whitespace class, so I add it to a temporary syntax table. Any other characters which should be considered whitespace could be added in the same way. Maybe this can be included in a future version of Emacs, since it is useful and not too complex.

Update: So about 3 minutes after feeling all smart and cool for posting this, I made a comment on the #emacs IRC channel and immediately got a response back pointing me to replace-regexp-in-string. That whole big (ish) function collapses down to

(replace-regexp-in-string "\\(^[[:space:]\\n]*\\|[[:space:]\\n]*$\\)" "" str)

So yeah, quite possibly too short to warrant its own function. Serves me right for being so high and mighty with my fancy syntax-table. Silly mortal, Emacs always knows better.

Saturday, December 19, 2009

Yo dawg

Yo dawg, I herd u liek Emacs, so I put an Emacs in ur Emacs so u can Emacs while u Emacs. For more information, check out my older Ezbl post. EDIT: See new Special Edition Emacs browser meme: Yo dawg, I herd u like Emacs in a browser, so I put a browser in ur Emacs so u can Emacs in ur browser in ur Emacs while u browse ur Emacs in ur browser.

Friday, December 18, 2009

The Mason's Dilemma

I just read Tim O'Reilly's Why Using ShopSavvy Might Not Be So Savvy, and have a few thoughts. The first of which is that I think that people will fail to see the connection between "Mason" and "Brick and Mortar" stores, which is largely a result of a sub-par title. "The Brick and Mortar Store's Dilemma" doesn't quite have the same ring, though. Bad metaphors aside, the gist of O'Reilly's article is that by browsing for an item in person and then buying it online, you incur all of the costs of running a retail business (rent, display items, employees, etc.) without providing them with the revenue (from the purchase), making it a very unstable system. He is absolutely right about this, and it is a big problem for physical and especially small, local retailers. I don't dispute his predictions of how this will play out, but I find myself leaning towards a few different conclusions about what we should do about it.

Last Man Standing

First of all, he states:
But what happens once those mega-retailers are the last one standing? Prices are likely to go up.
I don't know about that. I'm definitely not a hardcore Laissez-faire kind of guy, and it really isn't the main point of his article, but I think that setting up an online retail site (or account on Amazon or Ebay) has such low overhead that the barriers to entry are too low for a monopolist to jack up prices. A local mega-store might be able to get away with it, but they would face the same competition from online retailers that the smaller shops do. Assuming that the mega-stores can't jack up prices, there is an argument to be made in favor of small local stores being pushed out by large chains. If goods like food, paper, cleaning supplies, etc. are cheaper, then people will have more money left over for more "useful" things that create better jobs than sales clerks. This is known as the "yay efficiency!" argument, and I think it has to be balanced with the "everyone in the town goes broke" argument.

We're Nihilists

To some extent, I can't help but think that there is a degree of futility in his proposal to "buy where you shop" in that it is going to be hard, if not impossible, to get a meaningful group of people to voluntarily spend more than they have to on the same item for an abstract notion of "preserving local business." It sounds pretty defeatist, but I don't think it is sustainable to tell people "consciously avoid going for the cheapest option, even if it is more convenient." There's also a huge, unresolved Tragedy of the Commons dilemma in that I might decide that there are enough other people buying from the local store that it is okay for me to buy the cheaper, online option. It's great for me if everyone is paying extra to support the local store and I can get the advantage of its services without having to pay the premium for its products, but if everyone does it, then we're right back where we started. Part of the problem seems to be the same one facing the organic food movement: people don't perceive organic food being that important, so will buy the cheaper, non-organic product. You have to convince people that organic (or local, or brick-and-mortar) food is valuable enough to spend extra money to get it. Changing people's consumption preferences is really hard, and is probably not a good strategy by itself. I don't have any stats to back it up, but it seems like organic food is having more success going the "it's healthier" and "it's tastier" route than the "it's better for the environment" route, since health and taste are already important attributes to people.

What to do, what to do?

So what do we do? One approach that is mentioned in the comments and discussed in detail by my hero Clay Shirky is the idea of decoupling the two functions of local stores, the browsing and the purchasing. Shirky describes this far better than I could, but bookstores have already started doing this, by adding coffee shops in the stores, they are becoming social gathering points first and retail outlets second. The analogue for more general retail outlets might be a business whose purpose was to show off and let people play with gadgets and things without the assumption that they were going to buy them there, possibly supported by charging a membership. People like to go shopping not so much for the "walking away with new stuff" aspect but more for the "look at and check out new things" part. Malls have invested a huge amount of time and money into making themselves "experiences," but they don't make their money off of the "experience" part. If physical stores were thought of more as advertisements that you can go to rather than places to purchase things, they might have a better prospect. There are certainly issues with this approach, not the least of which is whether it is really viable, since the cost of setting up a physical store might be too high to be recouped by a reasonable membership fee. There are a thousand things that could fail, but another thousand that could succeed. Now is definitely a time for crazy ideas and experimentation, but I don't think the solution is to try to fight against the prevailing incentives.

Tuesday, December 8, 2009

Ezbl 0.3!!

After many long hours, I have finally reached a new milestone in the development of Ezbl! Check out the user-visible part of the changelog for version 0.3:

User-visible Changes:

* Automatically-resizing xwidgets

If you change the size of the window containing a browser window, Ezbl will
automatically resize the xwidget to fit the new size.

* Cookie handler

Ezbl is now able to honor cookie PUT and GET requests from Uzbl, using Uzbl's
(relatively) new "talk_to_socket" system. This means that it is not having to
spawn a Python process for each and every cookie on every page you visit, but
is talking over a UNIX domain socket (which is fast).

This, combined with the new event system has enabled the following:

* No more handler.py script

Ezbl is now 100% Emacs Lisp, and does not need to launch external processes
(aside from Uzbl, of course :). This is largely achieved through the new
event-manager system, which lets Uzbl notify Ezbl of changes and events over
Standard Output, making for a very quick communication.

* Much faster performance

This is largely related to the event and cookie handlers, but it bears
repeating. In a few tests of loading yahoo.com, Ezbl would take about 30
seconds to finish loading the page. That time has now shrunk to just over 2.5
seconds. By using the event manager to let Uzbl notify us of changes, rather
than querying and polling, the amount of dead-time has decreased
dramatically. One commit in particular (d99f336) is responsible for the
majority of this speedup.

* New Emacs dependency

Uzbl uses a special kind of socket type (SOCK_SEQPACKET) for its cookie
handlers, and Emacs doesn't support such sockets (before 23.2, which I helped
to add), so you will need to build a custom version of Emacs, available here:

http://github.com/haxney/emacs

Note that you need the "xembed" branch of that repository.


Performance has increased by about 10-fold (from "dismal" to "pretty good"), and most of it is due to a single, one-line commit. It truly is the little things that make all the difference.

As an example, loading Yahoo.com took about 30 seconds in the bad old version and it now takes just over 2.5 seconds to do the same thing, while remaining much more responsive during the whole load.

Anyway, it is getting to the point where it is starting to be actually useful (though without keyboard support, it still isn't ready to replace Firefox), which is really cool! The biggest limitation at this point is that it requires patches to Emacs that are not currently in any released version (though my SOCK_SEQPACKET patch is scheduled for 23.2!), so it isn't something you can just download and play with right away. I'm working with the people involved to try to get the relevant pieces in line so that experiencing the latest and greatest browsing system for Emacs isn't quite so big of a burden.

Once again, comments are always welcome, and definitely let me know if you have given it a try!

Thursday, December 3, 2009

Ezbl updates

It has been a while since I last talked about Ezbl, my Emacs interface for the excellent Uzbl browser.

I have moved far beyond simply being able to embed the browser in Emacs, and have added resizable windows, custom mode-line text, and most importantly, a cookie handler. The only major piece which still needs work is the keyboard input, since the only way to get text into Uzbl at this point is to paste it in with the mouse. I have some ideas on how to move forward; most involve wacky hacks with the DOM, so we'll see what turns up.

The next big push will be taking advantage of the new-ish Uzbl events system, so I don't have to do as much polling. Profiling Ezbl (with the built-in "elp" package) when loading Yahoo.com and a few Twitter pages gives the following, rather uninspiring result:



Yes, it really is spending 77 seconds of CPU time on ezbl-sync-request, and yes, that is TERRIBLE. Luckily, I know how to move forward. I had been using a reasonably clever (but apparently horribly inefficient) system for making synchronous requests to Uzbl (for getting the value of a variable, for example). Emacs has a number of built-in ways of doing this, but I am thinking that with Uzbl's event system, I can just store the value of all the Uzbl variables within Emacs (there aren't that many) and update them on the Emacs side whenever they change. This way, I'm not having to do an expensive query to Uzbl each time I want to update the mode-line, for example.

In other news, I worked (if you can call it that, it was about 4 lines of braindead-simple code) on a patch to Emacs to add support for the SOCK_SEQPACKET socket type. Since Uzbl uses SOCK_SEQPACKET for its cookie sockets, having this work was essential for being able to communicate with Uzbl (without having to patch Uzbl, which I wasn't keen on doing). There are plenty of problems with the SOCK_SEQPACKET patch to Emacs, not the least of which is that it only works for Unix domain sockets, and not actual network sockets. Bummer.

If anyone is comfortable (or at least competent, which excludes me) with SCTP, Emacs could use some expertise in adding SOCK_SEQPACKET over AF_INET sockets. It's a rarely-used but useful protocol, and it's a shame that there is something which exists that Emacs doesn't yet support :).

Stay tuned for more!

Monday, November 9, 2009

Android Tethering!

Don't tell Verizon, but I got (fairly basic) tethering working over USB with my shiny new Droid. The instructions are available at the TetherBot site, and work quite well.

A quick speed test shows that I'm getting basically the full speed of FiOS (my home ISP) when tethering using the Droid's WiFi (10Mb down, 2Mb up), and a respectable 1.2Mb down and 0.44Mb up when using Verizon's 3G wireless data. Not bad.

Of course, since the Droid can do nearly everything I would want my laptop to do, I don't see much of a reason for tethering, but it's nice to know that the possibility exists.

Sunday, November 8, 2009

Got a Droid!!!!

YAYYYYY!!!!!!

It's so completely awesome I can't even deal with it. I can SSH to my server with it. Oh man, this is amazing.

Monday, August 24, 2009

It's Alive!!!

Update: With version 0.1.2, opening up a page is much easier, so I have edited the instructions below.

After nearly a month of (very sporadic) work on it, I have finally done the impossible: embedded a fully-featured web browser in Emacs. Behold:



I am calling my creation "Ezbl," pronounced "ease-able." It is a combination of Uzbl and Emacs. Right now, Ezbl is still in the early stages, but it can display a Uzbl window and receive commands from Emacs (over Uzbl's STDIN).

Currently, it requires a patched version of Emacs (available here) which supports embedding widgets and windows, such as Uzbl. It is still in an early state, and any of the drawing-related problems will be caused by the xwidget code. Joakim Verona is the genius behind the xwidget code, so talk to him if you have suggestions about that aspect.

Right now, I'm using the following code to start an instance of Uzbl (which can be called interactively):

(ezbl-open "www.google.com")


You will notice that it creates a buffer called "*ezbl-display-xxxx*", where "xxxx" is the pid of the Uzbl process. To browse to a different page, execute:

(ezbl-command-uri 1234 "www.yahoo.com")


Where "1234" is the pid of the Uzbl process. This will be cleaned up in the future with an "Ezbl" major mode and interactive commands.

Just recently, I've added a live(ish) updating display of the current page title and URL to the mode-line, shown here:



The interface still needs quite a bit of work, and it is not yet possible to type into text areas of the browser window. If you are interested in helping out, I have a GitHub project here to which I regularly push.

Eventually, I plan to have Emacs manage history and cookies, as well as multiple "tabs," which will be done by having one Uzbl window per buffer. It has a long way to go, and is not at all useful for real browsing at this point, so help is greatly appreciated.

Enjoy!

Sunday, August 2, 2009

The music industry (again)

My dad wrote me another interesting email on the music industry. I couldn't help but to respond.


Interesting, but he equates "large recording companies" with "the music industry". Ignores live performances, and, apparently, any direct artist to consumer sales.


Which, importantly, are the ways that the vast majority of (especially big-recording-label) artists make most of their money. Most artists would be (and, in many cases, are) happy to give their recorded music away for free in order to attract more people to their concerts.

The last (only?) large music concert I went to was a Maroon 5 (a fairly popular main-stream band signed to Universal) concert Sophomore year. The tickets were either $40 or $60, and it was held in the TD Garten (formerly the Fleet Center) in Boston, which has a capacity of just shy of 20,000 people. I don't know exactly how many people were there, but it looked mostly full, so let's say 15,000. For a 4-hour performance, a gross of $900,000 was made. I don't know exactly how much they owe the venue, the lighting crew, their agents, etc, but apparently, walking away with 50-60% of revenues is the average for large concerts.

Apparently, this is a full flop from the 70's, when people did tours to sell their albums, according to this interview, artists are much more likely to give away their recorded music to sell their tickets. He says that the difference (for the large artists, at least) between the money they make from album sales and what they make from concerts is massive, not even comparable. Compare this with an artist's share of record revenues, to see how bad of a deal the bands get when they sign up for major labels.


On the other hand, it would be terrible if it ceased to be practical to be a professional musician.


I agree, to some extent, but it might be inevitable, and wasting taxpayer money to delay the inevitable seems like a bad idea. If it happens that the Internet makes professional music economically impossible, then we have to figure out what to do next; but trying to sue an industry into profitability doesn't seem like a worthwhile use of anyone's time or money.

Now, I happen to think that professional music is not in any serious danger, and may in fact be much healthier today than it was a decade ago, but it is distributed much more evenly. It is no secret that only 1 in 10 albums produced by the (major) record companies turn a profit, so the major labels are very much dependent on the album sales of a small number of large stars. They have a lot to lose from the "long tail" industry model, since their whole business is set up to maximize their profit from the "head" of the market.

In fact, they are so adverse to competition that they resort to illegal means, such as "payola" to promote their own music at the expense of independent artists who can't afford to pay. It is a scam worthy of Wall Street and goes like this: the music producer pays music radio stations to play its songs. This wouldn't be a problem, except that radios are required to disclose when they are playing sponsored music, and music selected by the DJ for "regular airplay," is not allowed to be purchased. The loophole the industry now exploits is that they can pay a third party "independent music promotion" company to pay the radio
broadcaster to play the music company's songs.


There is a level of accomplishment that can only be achieved with the total dedication to the work that only professionals can afford. But one has to be able to make a living doing it. If everyone feels entitled to steal the professionals' output, because it is easy, then soon the entire quality of music available will collapse. That would be a tragedy.


But if an artists work can be advertised with a minimum of friction (either in the form of cost or legal blocks), then they can reach a maximum audience, then they can attract large crowds to their concerts.

Also, there is the question of holding back progress that benefits everyone for the sake of a minority to whom we have an emotional connection. The printing press effectively destroyed the notion of a scribe as a professional and socially elite role, but it clearly did a lot to increase the viability of democracy. Yet, we are left with a world with no (or only very few) professional scribes.

Also, there has been music since the beginning of time, and nothing, not even government death squads, could stop people from creating music. Nowadays, there is a huge amount of money being spent towards "music" which doesn't directly support the ability of artists to live off of playing music. A huge portion of the spending on recorded music does not go to the artist, so all of that could be eliminated while still supporting the artist's cut of the profits.


What should the penalty be for stealing music. Let's make a few assumptions that are unlikely to apply in real life, but illustrate the argument.


From here:

In a 2003 case involving State Farm Mutual Automobile Insurance Co., the Supreme Court invoked the Constitution's due process clause in ruling that punitive damages in most cases must be capped at 10 times compensatory damages.


The damages in this case were "statutory," but there are a number of people who argue the same rules should apply to statutory fines as well.

From the Ars Technica article discussing the verdict:


Tenenbaum filed a motion to dismiss the plaintiffs’ statutory damages claim on constitutional grounds, but Judge Gertner deferred ruling on the issue unless and until there was actually a damages award handed down by the jury.



We know exactly how many people downloaded each copy of each song shared by the primary thief.


Maybe. The software doesn't necessarily keep a record of this, and it could easily have been deleted sometime long before the trial started (so it wouldn't be obstruction of justice or anything). The only way (barring a wiretap or spyware) to get that information would be to look at the log from Tenenbaum's P2P program. The whole point of P2P is that no central server keeps track of (or even knows about) the transfers going on, so if Tenenbaum deleted the record, there would be nothing to go off of.


Say the criminal posts 100 songs online, a small time crook, but a nice round number. Let's say each song is stolen by 10 people. So 1000 songs stolen.


I obviously take issue with the use of the word "stolen," as nobody lost any property in transaction, but the numbers seem fine.


Now, lets say that each stolen song represents an album that was not purchased. I know, criminals will not necessarily buy things that they would be happy to steal, but again, simplifying assumptions.


Hold on. First of all, an illegally downloaded song would correspond to at most one legally downloaded song, at $1 on iTunes or Amazon. To say that a pirated song represents one album not purchased is like saying (modulo the lack of equivalence between digital copying and physical theft) "stealing one car represents the loss of 20 cars to the dealership." It makes no sense.

As far as the impact of piracy on music sales, here are two reports on the matter.

One report says:


This study reveals that for every five albums downloaded results in one less sale


Assuming the same ratio applies to songs (which may not be the case, because of the tendency of albums to contain filler songs), 1000 songs downloaded would mean 200 songs would not get bought.

According to another report:


The results suggest that, for the group of users of peer-to-
peer systems, piracy reduces the probability of buying music by 35% to 65%


So this means between 350 and 650 songs bought. Let's be generous and say 500 songs would have been purchased would it not have been for the piracy of those 10 original songs. Not an order of magnitude difference, by any means, but a change from 1,000 albums.


So 1000 album purchases forgone. Say the albums average $10 retail. So $10,000 lost to the legal music industry due to criminal activity.


No, it would be closer to $500, because the substitute for an illegally downloaded song is a legally downloaded one, which costs anywhere from $0.50 (on eMusic) to $1.29 (on iTunes). Most big-label songs can be had on Amazon for between $0.80 and $1, so let's make it nice and even and say $1 per song.


Does this mean that stealing these 100 songs should cost the thief $10,000? Of course not. That is just the lost income restitution.


So, according to the Supreme court, the maximum penalty should be $5,500; $500 for compensatory damages plus $5,000 for punitive damages.


There is the cost of investigating and prosecuting the criminal.


Yes, but that has nothing to do with the punishment awarded for each song. I don't know exactly how lawsuits work, so it may be the case that the losing party has to pay the winning party's legal fees, but again, that is a separate issue from how much is owed per song. In this case, the jury decided on $22,500 in damages per song. That is what is insane.

If the damages were determined by the cost of investigation, I could just buy a bunch of private jets, crash them into a mountain, and claim that as part of the investigation cost. If the record company is only going to get $5,500 out of a trial, then it may well not be worth it for them to go after someone so small-time.


Now who should pay for the fact that the industry has to do many investigations to get one successful case?


This is false. Of the 18,000 people targeted by the RIAA, this is only the second which has gone to trial (the first was Jammie Thomas-Rasset). All of the other ones have settled out of court due to the embarrassing brokenness of our justice system for things like this (that's another story, though). They have a history of using illegal means to attempt to find individuals they suspect of file sharing, some of whom had never used a computer.

So who should pay for the illegal investigations? The idiots who are doing these investigations.


I would double the costs, at least. That way a successful prosecution pays for a set of lesser investigations that do not go all the way (losing in court does not appear to be a possibility).


Again, this has absolutely nothing to do with how much infringement of a single song should be punished.


Unless it was a short and easy investigation and prosecution, sounds like Tenenbaum got off light.


Well, part of the problem was that the judge denied a Tenenbaum any chance to argue his case. From the Ars Technica article:


Tenenbaum's case was dismantled piece-by-piece by a series of adverse rulings over the past several months. Judge Gertner dismissed his abuse-of-process claims against the plaintiffs and the Recording Industry Association of America; excluded four of his proposed expert witnesses and limited the scope of a fifth; and, in a coup de grace delivered less than eight hours before the start of trial, barred him from arguing fair use to the jury.


It's definitely not over yet.


Here is the future of the recorind business, as discussed on the NYtimes op ed page.


This was an interesting article, though it got some facts wrong.

Swan Songs?


The speed at which this industry is coming undone is utterly breathtaking.


Only to those who haven't been paying much attention. Napster was the sign that they were on their way out. A big, centralized distribution industry is unnecessary in an age when a college kid can, in his spare time, make a global music distribution platform that millions of people find more convenient than going to a record store. Distribution used to be hard; it isn't anymore, so it's not worth money.


First, piracy punched a big hole in it.


This is not nearly as true as the music industry likes people (Congress, especially) to think. They are not above blatantly fabricating numbers to make the problem seem much bigger than it is. This includes claiming (without so much as a shred of evidence) that the loss to the US economy due to piracy is greater than the "combined 2005 gross domestic revenues of the movie, music, software, and video game industries."

These companies would love nothing more than a law to prop up their failing and obsolete business models.


Now music streaming — music available on demand over the Internet, free and legal — is poised to seal the deal.


He might be right on this. I personally buy much less music as a result of listening to Pandora (which is now ad-supported). Well, actually, I think Pandora has caused me to discover and purchase a lot of new music, but I am kind of weird as far as music listening habits go.


This is part of a much broader shift in media consumption by young people. They’re moving from an acquisition model to an access model.


Now this is something with which I can agree. A lot of the music I listen to is used as background, filler music. I want some music to play, but I'm not as picky about exactly what it is. Pandora fits this situation perfectly. I don't necessarily need to own all of the music, but I just want to listen to it once in a while. There is a different class of songs which I do want to own, and am willing to buy, but it is smaller than the set of songs which I want to hear a few times and then am done with.

I think the problem is that they are used to having a constant stream of throwaway hits; songs which everyone has to hear a few times and then grows tired of. With physical distribution, the only way (aside from the radio) to hear these songs repeatedly was to buy the full album, which may only contain 2 or 3 worthwhile songs. To the RIAA, there wasn't a difference between someone who bought an album, listened to two songs a few times, and then let it sit on a shelf and someone who bought an album and listened to every song on it for years.

With streaming entering the mix, songs that fall into the "disposable" category can be streamed, rather than purchased, and since people aren't going to revisit them anyway, the lack of "ownership" of the songs isn't that big of a concern to the people listening to them. I imagine that this effect is more pronounced among the more technically-savvy and disposable-music-inclined 14-18 demographic, which accounts for the drastic drop in their purchasing habits.

In order to get album sales, the recording industry will have to produce music which is worthy of being listened to multiple times over an extended period, not just "summer hits" that are forgotten as quickly as they rise up the charts.


Even if they choose to buy the music, the industry has handicapped its ability to capitalize on that purchase by allowing all songs to be bought individually, apart from their albums. This once seemed like a blessing. Now it looks more like a curse.

In previous forms, you had to take the bad with the good. You may have only wanted two or three songs, but you had to buy the whole 8-track, cassette or CD to get them. So in a sense, these bad songs help finance the good ones. The resulting revenue provided a cushion for the artists and record companies to take chances and make mistakes. Single song downloads helped to kill that.


I have no sympathy for this problem. The only reason they were able to get away with this is that the distribution medium forced people to take the bad with the good. In essence what they (and the columnist) is saying is that "we got used to being able to charge $20 for 2 songs, and as soon as we give people a say in the matter, they are only buying what they actually like!" Cry me a river. If you want me to buy a whole album, make a whole album worth of music I am interested in buying. Competitive markets are tough, aren't they?


A study last year conducted by members of PRS for Music, a nonprofit royalty collection agency, found that of the 13 million songs for sale online last year, 10 million never got a single buyer and 80 percent of all revenue came from about 52,000 songs. That’s less than one percent of the songs.


this is a flat-out, baldfaced lie. Take, for example, eMusic, which only carries independent music. In 2008, it sold 75% of its 5 million songs, or 3,750,000 tracks. This excludes all major label music, so the "10 million never got bought" claim is patently false.

Additionally, PRS is hardly a neutral party. It, and similar companies, have been found by the European Commission to be in violation of antitrust laws. They have a habit of suing people for listening to the radio while at work.

All in all, the music industry is facing a similar problem to the newspaper industry: their main role was a distributor, and now distribution is essentially free. As with newspapers, there are a lot of complex issues at play, an just as "news" gets incorrectly conflated with "newspapers," so does "music" with "the major record labels." Music existed before the record labels and it will exist after them, and we will all be a lot better off once these gatekeepers are gone.

Friday, July 31, 2009

Riding the Wave!

The mighty gods of the Interwebs (Google) have deigned to give me an early Wave account! I have been playing around with it a bit, and two things have become abundantly clear:


  1. It is dangerously powerful. I was talking with my girlfriend on Wave (they give you a second test account), and we found that there are a lot of different ways of holding a conversation, including typing into two separate blips. The realtime nature of it really does make a difference, since you can start responding before the other person is done. This may sound gimmicky, but it really does change the way you do stuff.

  2. It is still buggy. I haven't been able to change my settings, because the "settings wave" doesn't seem to work for me. There are also a lot of missing features, and it has a habit of getting desynched or dying in weird ways. That's why they call it a development build.



Still, overall, it is pretty awesome. The simplicity of widget-making looks like it will enable an absolutely massive number of cool apps, and the open nature should do a lot to spur its growth.

I will have to say that it will probably be about 6 months to a year before the live editing of text and graphical doodles gets old. Well, that may be an exaggeration, but it is really freaking cool.

Now to start being useful and actually start developing for it; you know, like was the deal with getting the account.

Saturday, July 18, 2009

I need to stop reading about programming languages

Every once in a while, I start reading about the cool things going on with programming languages. I'm not one of those people who generally spends his days musing over tuples, continuations, and monads (I still have no idea what they are), but it I do find it interesting to take some time to think about the tools that I use on a constant basis on a more meta level.

There is a lot of cool stuff going on in programming languages now and pretty much since the beginning of time, i.e. January 1, 1970. How to handle concurrency is one of the biggest things facing the language world right now, but there are tons of other things, like JIT optimizations and experiments in language interoperability (such as Parrot).

One example of a solution to the "concurrency is hard and threads suck" problem is the Termite language. It is built on top of the Gambit Scheme compiler/interpreter and implements the message passing style of concurrency. Take a look at the paper describing Termite; it has some cool examples of what you can do with message passing stuff. Of course, this is all stuff that Erlang has been doing forever, but nobody seems to want to use Erlang for anything but telco systems, for whatever reason.

Of course, as an Emacs fan(atic), talk of a cool Lisp-family language immediately got me thinking about a replacement for the desperately-outdated Emacs Lisp. That lead me to a proposed plan for replacing Emacs Lisp with Scheme... from 1996. It's been more than a decade, and as far as I can tell, there has been nearly zero progress in actually making that happen. The plan makes it sound like it would be relatively easy to do (barring some incompatibilities between Scheme and Elisp), but there is not any hint of it being in a better state now than it was then.

This is not a criticism of the people working on it; there are a lot of brilliant Emacs hackers, and the fact that it hasn't happened already means that it is a really hard problem to solve. It is more of an example of why I need to avoid spending too much time reading about programming languages: it's depressing. In comparison to most everything else in the computing industry, languages seem to move at a positively glacial pace. I get all excited about how some cool asynchronous, message-passing, migratable dialect of Scheme could be used to make a concurrent and fast Emacs implementation; then I go back to PHP for my programming, which doesn't have namespaces, integrated packaging support, or even a decent threading system, and cry a little inside.

I blame you, Steve Yegge. You made me dream again.

Thursday, July 2, 2009

free vs Free

This started as an email reply to my dad in response to this article.

I saw this via slashdot, so you probably read it. Obviously I agree with Galdwell that taking digital recording over the web without paying for them is not simply a "choice" it is theft.


I get the sense that Anderson is not necessarily advocating copyright infringement (which is different from stealing, especially for things like newspaper articles which are given away for zero dollars), but saying that when the incremental costs are essentially zero (which they are on the web and internet as a whole) the price will go to zero. I haven't read his book, but from what I've heard, he talks about ways that a business can survive when it is selling things for free.

As an aside, though I don't think it is terribly important to this discussion, I'd like to make clear the distinction between the two definitions of "free." There is an ambiguity in English (which isn't resent in German and some other languages, by the way), namely that free" means both "costing zero dollars" and "freedom." This is the hing that the Free Software people always make a big deal about, sincethey are fine with people charging for Free Software, as long as the recipients of the program have the freedom to modify, etc. the program.

In this case, there is a phrase, invented by Stewart Brand, which gets thrown around that says that "information wants to be free." It's certainly possible to disagree with this, but it is primarily in situations of censorship; where it is very difficult to keep information hidden once it has leaked out. The idea is that you have to exert a lot of pressure (= time, attention, and money) to prevent information from flowing between people, that the uninhibited exchange of information is the natural state of info.

If you believe that "information wants to be free," then you would have to spend an enormous amount of time and energy trying to keep it from being free, especially on the internet. This is what we've seen with the RIAA, where they are trying hard to give certain people access to information, while barring others, who haven't bought the song, or live in a country which hasn't agreed to certain licensing terms, for example.

A radically different way of looking at things, then would be to say that rather than try to put information in a little box and then give people access to only that box, take a step back and let information do what it does best: spread around. The "problem" with this is that you pretty much have to give up on trying to extract money from each transfer of info between two people, since doing so will vastly limit the usefulness and exposure of the information.

Or, to put it like a Mathematician, if you have a problem with people stealing things, simply define their action as not stealing.

Anderson argues, [the] magic of the word "free" creates instant demand among consumers, then Free (Anderson honors it with a capital) represents an enormous business opportunity


Arg, just what we need, another definition of "Free." Free Software people already have enough trouble getting their definition to gain traction as it is.

Anderson cautions that this philosophy of embracing the Free involves moving from a “scarcity” mind-set to an “abundance” mind-set.


I like this idea, since it is the only one that makes sense in the digital world. There isn't (for all practical purposes) any cost involved in making a copy of a set of bits, so the notion that there can be any sort of sacristy of any digital item is false. Companies have tried to artificially create scarcity where there is none through things like DRM or activation keys or licensing, but it goes against the natural properties of the medium, that there is no scarcity.

For a detailed, well-researched account of this phenomenon, look at this page talking about the complex issues involved in scarcity in Sony's "Home" service. A choice quote:

There are things about Home that are simply beyond my understanding. Chief among these bizarre maneuvers is the idea that, when manufacturing their flimsy dystopia, they actually ported the pernicious notion of scarcity from our world into their digital one. This is like having the ability to shape being from non-being at the subatomic level, and the first thing you decide to make is AIDS.


He gives the example that "if you approach an arcade machine and there is a person standing in front of it, you will not be able to play it." It is foolish to invent scarcity in a system which is free of it.

The problem is that our entire economy presumes the existence of scarcity, so we don't have a good way to deal with abundance, since we have never really encountered it before. The rules of economics will have to be rethought, or at least, some of the aspects reconsidered, in an environment which lacks something as fundamental to economics as scarcity.

More importantly, Gladwell makes an interesting argument that just because some costs of production have gone down, that does not mean the overall price of a product must go down. Instead, he cites examples where people
have absorbed these cost reductions, but turned their efforts towards ever more elaborate, and expensive, applications.


The YouTube example is an interesting one, since both authors use it. My take on it is that it isn't intended to make money now; Google knew that it was a money loser going it. I think that the idea for Google was that online video was a growing and important market, and they wanted to control the largest provider so that they would have a foot in the door once it matured. This may sound like a cop-out, but plenty of "traditional" companies have loss leaders, it's just that in this case, YouTube is not so much a loss leader for another Google project, but for the future, when it will be profitable (or that is Google's hope). If they didn't think it was worth it, they would drop it like it's hot.

In this case, I don't think YouTube is that good of an example for Anderson, since it doesn't show how a business can survive off of giving stuff away for free, it shows how a business is willing to take even very large losses in order to enter a market. The best I could see is that it is a good example of how people will react to free publishing and viewing of videos. So rather than looking at whether or not YouTube is profitable, look at whether or not people using YouTube as a platform can be profitable.

Genetic engineering means that drug development is poised to follow the same learning curve of the digital world, to "accelerate in performance while it drops in price."


I can see where he's coming from, but until the cost of physically producing the drugs is "too cheap to meter," I think pharmaceutical products will still cost money. What I could see happening is if a "pill printer" is invented, which would allow anyone with the blueprints for a drug to make physical pills, then the price of drugs would quickly approach zero. I get the sense that creating such a device is unlikely, or at least, unlikely enough that it won't be cost-comparable to paper printers. But assuming that did happen, then drugs would "only" be information, and a downward pressure on prices would arise the same way it has for music.

Genzyme isn’t a mining company: its real assets are intellectual property—information, not stuff. But, in this case, information does not want to be free. It wants to be really, really expensive.


Assuming that information wants to be anthropomorphized, I would say that the information wants to be free, but the drug company wants exactly the opposite. That is to say that if the "source code" of the drug ever got leaked, it would be very difficult to keep it from spreading around, the "wants to be free" part of the quote.

Overall, I think that Anderson has some interesting ideas, but may go a bit too far with them, since "Free: Sometimes a Good Price for Certain Things" wouldn't be as interesting. It could be that Gladwell overstates Anderson's commitment to the idea in order to set up a straw man, or Anderson overstates the point to try to push people's expectations; I'd have to read the book to know for sure.

I think that there is a lot to be said for free information, and that Free (as in Freedom) sharing without restrictions on use and reuse are best for the information, but not always best for the creator of the information. This is extremely dependent on the field and particular scenario, which is why it is so tricky.

The Open Source Software movement argues that Open Source development is a superior way to develop software, since the benefits to the code of low barriers to entry outweighs the harm to the developers of having a harder time of making money off of the code. This equation changes even within software development, as Open Source Software has had a hard time gaining traction in game development and video editing software. In those cases, the advantage to the coder of being paid for professional development outweighs the cost to the code base of having a limited number of contributors.

I would take an even broader look and find that as the cost of information technology has dropped, the returns to those who deal in information and intellectual property have increased, greatly.


And this leaves room for entities (not necessarily even "companies") which have vastly lower costs, by employing people part-time or on a case-by-case basis to eat their lunch. One interesting business model which hasn't been explored much (except by Amazon's Mechanical Turk) is the idea of "micro employment" (my term), in which people are paid per task, and the tasks last on the order of hours to days. Rather than hiring a journalist to a full-time position, you ask for stories, and pay people per story. Then, rather than paying for a copy of the story the person wrote, you would pay them to produce a story, which could then be read by anyone without restriction. There are certainly advantages and disadvantages to this idea, so I'm not saying that this exact thing will be the future of all employment, but it could be an interesting way to go.

The problem for the recording industry so far, and coming for the newspaper industry, is that they sold a completely homogeneous product. My copy of a CD is identical to any other, whether I pay for it like a law abiding citizen, or steal it. Once it is produced it becomes generic information, both easy to reproduce and steal today, and equally valuable to anyone.


So, maybe rather than trying to sue people into supporting an old business model, you can embrace that fact and charge money before the information is produced, and then give out unlimited, unrestricted copies once it has been produced? For musicians, this is pretty easy, since a lot of musicians make most of their money from touring and merchandise anyway, abandon the notion that recorded music is a business and treat it like advertising for live shows. For a lot of artists signed to major labels, this is the only hope they have of digging themselves out of the hole that the recording company puts them in. But that is a whole other rant.

But what about intellectual property whose value is highly individualized?


I think this could be a major way forward. I would charge an individual to produce information (I don't like the term "intellectual property," because falsely equates physical property with information). In this case, the information I gave them would come without restrictions, except for proper attribution, which I consider even more important in an "abundance" economy than a "scarcity" one. If they shared the information around, it would actually be better for me, since it would be free advertising for other people to buy my service as well.

Legal or medical advice? Architectural strucutral plans for the building you want someone to put up for you? Not just "a database program" which you can steal, but an actual database that works, well, for your company?


Again, it is not "stealing" if the company intentionally gives away copies for Free (without any restrictions). Is it possible to "steal" Linux? Only if you take aspects of it and release them as closed source software, but it is impossible, by definition, to "steal" copies of Linux, so the term doesn't even apply.

This is what companies like Oracle do a lot of. It is the "infrastructure" model of Open Source software, in which you give away the infrastructure and charge for the customization.

A similar scenario would be giving away the plans for a theoretical bridge between two perfectly even concrete blocks over a big, perfectly flat, concrete base, but charge for building a custom-designed bridge in a particular place. I imagine that the basic ideas behind a bridge are given away freely in the form of textbooks, and that the real thing you are paying an engineer for is figuring out how to adapt those generic plans to suit a particular location (where rocks need to be blasted, roads re-routed, etc.). It would be to everyone's benefit to have some people who did nothing but think about how to make better generic bridges, and then gave that info away for free, so that the actual bridges could be made better. Concretely, the plans for "a suspension bridge" should be free, but you could charge for the plans for "the Golden Gate bridge."

The returns to people who do a good job at these things are high. They are higher now, relative to average compensation across the economy, than they were 10, 20, or 30 years ago, when many of the tools they routinely use today did not exist.


This makes me think of the fairly abstract, philosophical issue of whether people "deserve" to be paid. Surely, I shouldn't be forced to pay someone who has dedicated their life to learning to hula hoop really well. If they want to charge money for a performance, that is fine, but they can't force me to go. Likewise, it would be unreasonable for them to expect the government to force people to pay them for making a plastic ring and wiggling it around their hips. If there were a large hula hoop industry that was seeing its revenues decline, they would probably whine about how they needed the government to set up laws to protect them.

Likewise, there is nothing that says that there have to be professional musicians, actors, or producers. We have those professions now, but if the technology changes to make their jobs useless, why should everyone else pay to keep them around? It doesn't need to cost anything aside from the bandwidth used to distribute music, so why do we need a company whose purpose is to distribute music? Similarly, it costs almost nothing to send bits of text around, so why do we need newspaper companies with a fleet of distributors?

People will lose their jobs. Maybe even more than will gain new jobs from new business models. But should we have resisted the development of cars to save the buggy whip manufacturers?

So a newspaper might have little future reporting on the small amount of real news, and large amount of nonsense, currently in a paper. But they might have a great future generating customized content to, for example, an oil company that wants to be up to date on economic, political, and weather issues that impact its industry. That company might have exactly zero interest in giving away the custom content it paid for because 1. it paid for it and 2. the only other customers are its competitors.


Or, the company would merely only want that information to exist, for its own purposes, but not care about other people getting it. Aside from the issue of competitors knowing, if I want to know stuff about the oil industry, why would I care if other people knew that stuff as well? I can see a system in which people care more about information existing than being able to prevent other people from having it. I pay money to learn to dance, but I don't expect that the teacher will prevent other people from learning what I did. Also, the teacher does not try to prevent me from teaching other people what I learned, because it serves as advertising of his teaching skill.

The Kindle sounds like a rip off to me- [I] have a computer, what do I need with another device that costs more than a mid range laptop? But I don't blame people for telling them to take a hike. Without readers authors are out of work. But without content there is no reason to buy a Kindle.


Well, the point of the Kindle (for consumers) is that it is much more pleasant to look at than a computer screen for long reads. The "E-Ink" display technology doesn't require any active light or power to hold an image, so it looks a lot more like paper than a computer screen. It is also light and portable if you are comparing it to a book. I don't read enough books to make it worthwhile for me, but think about how much cleaner your room would be if all of your books were consolidated in a single device. If you get the chance, take a look at one in a store sometime, they really do look like paper.

I do have some problems with it, like the DRM that Amazon has for downloaded books, but that is more of an aspect of their service, and not of the device itself. The newly released model, the Kindle DX, can read arbitrary PDFs.

It seems like the example of the Kindle at the beginning was more about who got what cut of the money than any philosophical debate about freedom and sharing. I bet that if you flipped the equation around, so the publisher got 70% and Amazon got 30%, there wouldn't be a complaint.

As for this part:

The people at Amazon valued the newspaper's contribution so little, in fact, that they felt they ought then to be able to license it to anyone else they wanted.


I would have to look at the specifics to get a clearer picture, but it strikes me that the newspaper shouldn't care whether someone is buying an article on a Kindle, on a computer, or on an iPhone 58G QMBZ, as long as the consumer is paying the correct amount. The author shouldn't have the right to dictate on what devices their content can be purchased or viewed; they should only be able to charge for a copy. It would be like saying, "you may only read this book while in a brick house." Why should they even know what device I am using to view or purchase the content?

Again, I think this is a question of pure revenues; the publisher didn't really care that much about the third part licensing, except for the fact that Amazon might license the creator's content for less than the creator would have. If the "third parties" agreed to a price set by the content creator, then you probably wouldn't hear anything about it.

Again, I'd have to look more at this deal to learn the specifics, but I really don't like the amount of control authors have to deny access to certain devices or systems. It means that any new device coming to the market has to spend a huge amount of time and energy setting up licensing deals in order to be granted the "privilege" of purchasing and playing content. This greatly limits the amount of innovation that can happen around content-centric devices. A more open model, in which all content is DRM-free and a standard way of purchasing content exists and does not know or care about the particular device being used would be much better.

From someone whose entire livelihood depends on customized intellectual property.


I think that you are likely to be pretty well off, since, privacy issues aside, you only stand to benefit from your diagnoses being freely redistributable. A diagnosis of one person's X-Ray is pretty useless to someone else, except for someone trying to learn, and in that case, it would be good for society if the means of learning were as cheap and easy as possible.

In conclusion



It seems like there is a big split between the "old and not-really-applicable-to-the-internet" model of "information is a just like a widget, and you pay for each individual item" and the "new-but-incomplete" model of "information has no restrictions on its distribution." The new system is very incomplete and has a lot of problems, the biggest of which being "how do you pay for it?" Some people are starting to figure this out for certain areas, and I'm certain that more will follow. It is important to be open to the possibility that companies and even jobs will look dramatically different once this new model has fully set in. There might be less "9 to 5 in a cubicle" and more freelance-type work where people are paid for jobs lasting moths, weeks, days, hours, or minutes. It's exciting to think about what is ahead and to envision how these new technologies can change the way the world works.

Friday, June 26, 2009

Fully Homomorphic encryption

This was originally a response to an email from my dad, but I've turned it into a blog post. The original story is on Slashdot and the abstract for the paper is also available.

This is great. We can then store all our data on NSA computers, access it anywhere, and not care that they can read the unencrypted data. In fact, I am sure they have high quality back up.


I think you are being sarcastic, but the idea behind the system is that if it works correctly (which is something that the NSA might have figured out a way around), you would never actually send the NSA your unencrypted data.

Interesting that IBM would push this as a feature of cloud computing.


Well, the big idea is that this lets you overcome one of the classical big problems with cloud computing, namely the hesitation associated with giving anyone (even a "super-duper-trustworthy person") all of your private data. This way, you get the advantage of letting someone else deal with your IT costs (where they can apply more specialized expertise and take advantage of economies of scale) without having to give up confidentiality of your data.

IBM could build a bunch of big datacenters, and then have people pay to host their applications and data in the cloud, secure in the knowledge that their data is safer than with a traditional cloud.

After all, this could be valuable without ever resorting to a cloud.


Absolutely. Assuming this holds up to scrutiny (which is certainly not a guarantee), this could be one of the biggest advances in cryptography in decades, perhaps even since the invention of public-key cryptography (which enabled https, among many others). It would allow computers to process information they know nothing about, including down to the level of the processor. That is, the processor itself can't decrypt the data on which it is operating.

Suppose your accountant was working on the company's books. The laptop could have encrypted data, and the accountant could do all their work without ever having an unencrypted copy on the computer.


Exactly. Or think of a database admin who has to ensure that the database stays running efficiently, but shouldn't be allowed to see the people's encrypted data.

Even the output file could be encrypted (I assume).


Yes, that is the whole point. The client takes some encrypted data and a program and produces an encrypted version of that data and a modified program which, when run on the encrypted data, will produce the same result as running the original program on the unencrypted data and then encrypting it.

So I fill out my tax form, send it to some company, they do a computation, and give me an encrypted tax return, but they never get to see any of my private data. The result only gets decrypted on my computer.

If someone stole the laptop they would have useless files.


Exactly, and that could be a big boon as well. Not only do you not have to worry about lost laptops revealing information, but that info never needs to make it to those companies in the first place.

But IBM has to find a way to say this is marketable. Right now the advantage of the cloud to a commercial entity has to be its capacities: High volume storage, data integrity, security, or analysis expertise. By offering to do at least some things without ever seeing the data the big company gives you a reason to let them have your data. Or, I suppose, one could say "they have your files, but they do NOT have your data"


Exactly. They will say, "running a massive supercomputer is HARD! Let us do that for you and sell you some time and space on the server." It is like the old time-sharing days.

I think that if the following three things happen, this could be earthshattering:


  1. The system turns out to be secure.

  2. The overhead (both time and space) of the system isn't vastly higher than the unencrypted version; say, less than an order of magnitude slower.

  3. IBM resists the temptation to patent the idea.



1) is important for obvious reasons; if the system isn't secure, it is just a fancy waste of time. If doing the fully homomorphic encryption is much, much slower then this won't see much use outside of specialized applications. 3) is important because putting a patent on the system which prevented or impeded the development of alternative implementations and usages would prevent it from becoming a universal standard, able to replace legacy systems. The temptation to patent and hold onto it will be high, as it could be a large competitive advantage, but it could end up being much more useful to the world in general if it became ubiquitous.

Sunday, June 21, 2009

Weekly Update 4: Completion of Version Control Integration

Whew! This has been a busy week! I did not get the rules integration completed as I had hoped, as the Git hooks ended up taking a lot longer than expected (and are still not yet done). The problem is that access checking under Git is MUCH more difficult than under SVN for a few reasons:

  1. SVN calls the pre-commit hook once for each commit (duh), but Git only calls the update hook once per push (per brach or tag).
  2. Figuring out which commits to check is non-trivial, and can involve a number of different conditions.
  3. The author of a SVN commit is the same as the authenticated username of the person uploading the commit, so access checking is easy. In Git, the author, committer, and pusher can be totally different people, and Git (correctly) does not store any information about the pusher.
  4. Passing the appropriate authorization information to the hooks is non-trivial.

Definitions/Clarifications

There are a couple of complex ideas in there, so I'll take a moment to define what I'm talking about.

Once per push

With DVCSs, commits happen on the user's local computer, so Drupal (obviously) cannot check commits until they are pushed to the server repository. What this means is that Drupal sees a whole bunch of commits at a time.

Author, Committer, Pusher

This is a distinction that does not exist within centralized VCSs, as there is only one way for a commit to enter the repository. In Git, and I believe the other DVCSs as well, there is a difference between the author of a commit, the committer, and the person who is pushing to a repository. For the purposes of demonstration, I will call the author Alice, the committer Carol, and the pusher Pat.

  • Author: The person who originally wrote the commit. This gets set when you run "git commit", and does not change as a commit floats around between repositories.

  • Committer: The person who added the commit to the repository. By default, is the same as the author. It will be different if, for example, Alice emails her patch to Carol who commits it to her repository. In Carol's repository, Alice will be the author and Carol will be the committer. If Carol emails the patch to Charlie and he commits it, then Charlie would be the new committer.

  • Pusher: The person who pushes a commit to a remote repository. It is this person who needs to be authorized in order for a push to succeed. It doesn't much matter who a commit was written by, as long as the person adding it to the mainline repository is allowed to do so.

    In the original examples, Alice writes a commit, mails it as a patch to Carol, who then asks Pat to upload it to drupal.org. Pat has an account on drupal.org, but neither Alice nor Carol do. Pat pushes the patch to the main repository on drupal.org, and the push succeeds because Pat is allowed to push.

With the current workflow on drupal.org, Alice would post a patch as an attachment on a bug, Carol would mark the patch "reviewed and tested by the community," and Pat would commit the patch to CVS.

Authenticated username

Since no mention of Pat is included in the commit he is pushing, some method external to Git is needed to determine whether a push should be allowed.

Solutions

So far, I have only implemented solutions to 1 and 2, though the way to progress forward on 3 and 4 is now much more clear (after this week's IRC discussion).

Which commits to check

Figuring out which commits to check can be tricky, since an updated ref could be a fast-forward (nothing but a linear set of commits between the old and new location of the ref), a non-fast-forward push (such as a rebase), or the creation of a branch (so the "old commit" is 0000000000...). Additionally, if a ref is changed, but does not introduce any commits, then no commits need to be checked. This will occur if, for example there are three branches, "master", "next", and "test", where "test" and "next" point to the same commit. If "test" is changed to point at "master", then no commits are actually added to the repository, so the only check should be whether the user is authorized to modify branches. This adds complexity which is not present in the SVN backend.

I have a draft implementation of this logic, but it needs to be tested. I am working on the tests, which will include a set of sample repositories and push different sets of commits to ensure that the correct set of commits is tested.

Authentication and Authorization

The solution I came up with on IRC was to mimic the behavior of programs like InDefero and gitosis by using ssh forced commands to associate usernames with ssh keys. Here is the steps the control flow will take:

  1. user runs git push

  2. User connects to server via ssh. All users connect through the common user git.

  3. ssh server looks up user's key in .ssh/authorized_keys and sees that there is a command= property on that user's key

  4. The value of command= is run, which would be something like git-serve.php .

  5. git-serve.php checks whether there is a drupal user with (or who has a vcs account username) and if so, sets an env variable GIT_DRUPAL_USER_NAME.

  6. git-serve.php grabs the value of the env var (which was set by ssh) SSH_ORIGINAL_COMMAND, which will be git receive-pack and runs that (if step 5 passed).

  7. git receive-pack runs the update hook once for each branch or tag being updated. It gets the user name from GIT_DRUPAL_USER_NAME.

  8. The update hook builds $operation, $operation_items for each commit being added (using the steps described earlier) and sets the author of $operation to GIT_DRUPAL_USER_NAME.

  9. If any of has_write_access($operation, $operation_items) fails, then that ref update is refused.

It is complicated, but it should be doable, and won't actually be all that much code (since ssh handles a lot of it).

Friday, June 19, 2009

Researchers conclude piracy not stifling content creation

An interesting look by Ars Technica at the effect (or lack thereof) of piracy on music creation.

I have long said that the purpose of copyright law should (and was, you know, like in the Constitution) be about "promoting the progress of science and the useful arts" and only incidentally about supporting artists and their families.

I don't care about the livelihood of artists. I really don't, and neither should you. I mean, people not starving to death is always cool, but it doesn't really affect me if some random musician gets paid or not. What I do care about is that new music that I like gets produced. If new music that I like is being produced, as long as it is not created by child slavery or prostitution or something, I really couldn't care less whether the artists are granted monopolies on their creations or the music is created on communes where everyone is forced to use the communal toilet to fertilize the glorious communist fields.

Congress only has the power to "promote the progress of science and the useful arts," and if the current copyright system fails to do this, as this and other articles argue, then that system is unconstitutional.

CNNfail and expectations of news networks

This started as a reply to my dad about this article, but got long enough that I decided to make it a blog post.

Interesting read, but I would frame it differently. I gave up on CNN a long time ago.


I don't think I ever gave them a serious chance, and having watched the Daily Show for more than a few episodes, I feel entirely justified in this conclusion.

It seemed that they had concluded, as a business decision, that there was not viewership for news.


Or at least, that it was more cost-effective to report on Paris Hilton than to report news.

Sure, sending reporters around the world is expensive, but it really is not necessary. You gain something from boots on the ground, but what you really need is the determination to discuss the news.


The interesting thing to me is just how much you can actually report without boots on the ground at all. As I have said to Evie a number of times when discussing the future of news, "international news is just local news in a different place." One thing that we are clearly seeing is that it is very possible for the "new media" to have detailed, up-to-date, and accurate accounts of events happening on the other side of the world.

The "old media" system relies on a small number of people dedicating all of their time on all of the news, but the new media allows a large group of people each to focus on one event or category and thoroughly report on just that one topic. For example, the guy who put together the Tatsuma site probably does not also cover the tax breaks for green roofs in New York. If he were a TV channel, that would be a problem, since you couldn't rely on him for your sole source of news. In a new media setting, you only go to him for info about the Iran elections, and go to someone who devotes their whole day/month/life to green roofs in New York. Everything is important to someone, and if it isn't, it likely isn't newsworthy.

My counter example" my beloved CNBC. Now it is only business and financial news. However, they really do B and F news. In depth, in detail, intelligently. No Paris Hilton, no jokes, no pointless chatter among the hosts.


Given the trends among other TV "News" networks, this is surprising and encouraging.

It looks like it should be very cheap to produce. They have several people in the studio, and they report the news. Much of the time is devoted to interviewing people who know what they are talking about (senior executives at corporations and securities analysts), and giving them time to answer thoughtful questions. I have seen them spend 20 minutes discussing interest rate policies and its effects on financial companies with the head of a major insurer.


I'm not that into the details of B&F news, but I really wish that this type of news network existed for other areas as well, and I'm finding that more and more of this kind of thing is done on blogs and social media.

If CNN were to have someone like that on at all, they would ask some idiotically simplistic question, then cut them off after the first 15 seconds of the answer.


And then have the host blather on about some generic topic that only tangentially related to the topic and finish off with a half-hour of soundbites.

CNN could do this. But they have decided that Brittny Speers's haircut is more newsworthy.


I think the problem, and the most profound part of the article, is that there exists this implicit idea with people who complain about this kind of thing that there is some sort of "social contract" between the "news" networks and the people that the news companies will report on what is relevant and important. The reality is that these companies are businesses, and so will do whatever they can to maximize profits. I think that the problem arises when the networks advertise themselves as the former but deliver the latter. People don't get outraged when E! reports on Britteny Spears' haircut because they make no claim to be anything but a gossip column about celebrities.

The technological and social frameworks for a complete replacement of these old media businesses are not quite yet in place, but they are close, and when they are, the entertainment companies posing as news are going to be in trouble.

Wednesday, June 10, 2009

Weekly Update 2: Completion of Version Control Integration

A little late this week, as I was moving into my summer apartment.

  • The biggest (and really only) news is that I finished the Subversion hooks, and (slightly) improved the backend in general. It is now possible to deny commit access to users based on whether or not they have a Drupal account with that particular Subversion repository. Branches and tags, however, are not supported, as the Subversion backend does not support them. There is the framework for doing so, as the repository creation/edit form accepts path patterns for trunk, branches and tags, so adding support should not be too difficult.

  • The post-commit hooks allows Drupal to gather log information as each commit is made, rather than on a cron run. There isn't much else to it, but it can provide for a bit more immediacy of actions, and in the future, could be an event for the Rules module (something I will work on later in the summer).

  • Both of the hooks come with a set of SimpleTests for ensuring their correct functioning. The tests take a little while to run, as creating, checking out, and committing to a subversion repository take a while (and don't play that nice with the OS-level filecache, since new objects are being created each time), but are helpful for verifying that everything is in working order. They test both positive and negative cases, making sure the hooks fail gracefully when passed invalid information. They currently all pass (as one should hope !), and are able to catch a decently wide range of errors in the operation of the hooks.

  • A minor aside, but it bears mentioning, since I spent several hours testing it, but I discovered that Subversion allows any UTF-8 character for any string (comments, user names, and files). However, certain tools restrict especially the username to a subset of those characters, depending on the configuration options for that program. For example, the svnserve 'passwd' file format uses brackets and equals signs as part of its syntax, so those characters are prohibited in entries in that file, thus preventing Subversion users from having those names. However, if authentication is done in a different way, such as with ssh keys, this restriction no longer applies.

Well, that's all for now, stay tuned next week as I complete the Git repository hooks.

Tuesday, June 2, 2009

Weekly Update 1: Completion of Version Control Integration

This week was not a terribly busy one, with most of my time spent getting acquainted with the APIs of Drupal, the versioncontrol module, and Subversion. I was able to complete the pre-commit hook for Subversion, as well as the basic configuration for the other Subversion hooks.

I also identified a number of areas in which the SVN backend needs more work, such as recognizing branches and tags. The framework is in place, as the repository creation form includes fields for the branch and tag patterns, but currently no code makes use of those fields. Subversion is a tricky case, as it has no native concept of branches or tags, so the branch of a commit depends on the path of the files it modifies in the repository.

With the prep work out of the way, I hope to push forward to complete the Subversion hooks and begin on the next step of the project, the hooks for Git.

Friday, May 29, 2009

The future is now! (or in a few weeks)

Go take a look at the video on Google Wave and tell me that it isn't the future of everything. Go on, I dare you.

Monday, May 25, 2009

Tabs of Genius

Since I am going to be coding full-time (or nearly) for my summer job, I took this as an opportunity to undertake some long-overdue changes to my Emacs installation. Among the changes are:


  • Adding Drupal file extensions (.module, .incto the recognized PHP patterns

  • Using notify.el to pop up the cool new notification system in the latest version of Ubuntu whenever my nickname is mentioned on an IRC channel which I have joined.

  • Replacing my own ido-enhanced M-x with the pimptacularSmex program.

  • Updated nXhtml to the latest version. For a good example of how NOT to write commit messages, check out the (otherwise excellent) repository on Launchpad.

  • Move some more packages from my elisp directory to ELPA. This lets me cut down on the amount of path configuration and autoloading in my configuration files, as well as managing its own updates.

  • Replace pabbrev with Smart Tab, (hence the name of this post) which uses the massively powerful hippie-expand to return better results. I ended up developing it from a simple function rebinding tab to a full-fledged global-minor-mode (which does nothing but rebind tab :) and a git repository on github. My next move is to clean up all the little issues and try to get it included in ELPA. It is my first time maintaining any sort of package for public use, and a lot more goes into it than I thought. It's kind of neat.

  • Shuffle some configuration files about and update the copyright dates on my config files. Not like anyone is going to infringe them, right?

  • Spent FAR more time than should ever be necessary hunting down a (relatively benign) bug in anything.el which clobbered the key binding for occur. Long story short: there is actually a difference between copy-keymap and set-keymap-parent. I'm going to see about pushing that change back upstream, though I'm not exactly sure what the "official" repository for anything.el is, aside from some files on EmacsWiki.

  • Spent a great deal of time figuring out how to get Smart Tab and auto-complete to play nice with each other. It eventually turned out that the bulk of my problems was that I was setting up my autoloads after loading Custom, so libraries which were expecting Custom to initialize them were left hanging.

  • Started, but got sidetracked from, setting up some form of tags file for at least PHP, since that is what I'm going to be spending most of my time working on. Unfortunately, given the nature of the interaction between the program which generates the tags and Emacs, it is non-trivial to have an automatically continually updated base of tags. Additionally, the format Emacs uses is apparently fairly inefficient (requires linear scans for all/most operations), but there is a rich ecosystem of libraries and helpers around it. I'll have to look at it some more later.

  • Looked into being able to use ido-completing-read everywhere the vanilla completing-read is used, making variable and function lookups much more pleasant. It is not as easy as simply defaliasing ido-completing-read to completing-read, since ido-completing-read itself calls completing-read during its execution. I asked StackOverflow if anyone had a solution, and nothing has come back so far. It would be really convenient if there was a completing-read-function variable which you could simply set to code>completing-read or ido-completing-read and be done with it. What's worse, ido-completing-read is a drop-in replacement for completing-read, so most libraries likely would not notice or care.



Whew! That was a lot! I haven't had a good Emacs indulgence in a while, and I forgot how fun it is!

Many of the changes I made were a bit more under-the-hood, but I've already fallen in love with Smex, and from what I hear, Anything.el is pure bliss (I haven't had a chance to set it up and get used to it fully).

The big thing I'd like to spend some more time on is improving the situation around EmacsWiki. It is a great site, with a healthy community of interested people, but a Wiki is a pretty bad form of version control. Once need only look at something like line numbering, where there is one for Xemacs only, line-num.el, linum.el, setnu.el, setnu+.el, lineo.el, and a number of patches for each of them, mingling around in a long page which also serves as the bug tracker. I would like to see much more organization (though obviously not at the expense of productivity) and ease of deployment (like with ELPA). Even for packaging, there are at least six different and incomplete (in that they don't index all or even most packages) implementations of packaging programs. Competition and diversity are good, but there comes a point when it means that nothing actually ends up usable.

Well, I've certainly got a lot cut out for myself, but it all looks interesting. And maybe, when I'm done fiddling around with Emacs, I might actually get to some Drupal programming. You know, the thing that's supposed to be my job.

Friday, May 22, 2009

Holy Ubuntu!

Wow, I love Ubuntu.

I just found out that if you hover the mouse over a audio file in Ubuntu (I'm running 9.04, I don't know about the older versions), it will automatically start playing it. No clicking, no nothing!

Not a terribly big deal, but a very cool little thing. Hooray for finding out cool stuff!

Friday, April 24, 2009

I got GSoC 2009!

My proposal for the Google Summer of Code 2009 was accepted! I will be working on Drupal, specifically completing work on the Version Control Integration API with the goal of allowing Drupal to switch from CVS to a DVCS (most likely Git) for its main version control.

It will be a difficult task, but much of the baseline work has already been done, so I will be mostly filling in the gaps in functionality and finishing up the few areas where the Git backend lags behind the CVS one. You can look at my proposal here, and I will endeavor to make relatively frequent status updates on this blog. I am chrono325 on Drupal.org, and will try to spend some time on the #drupal irc channel on freenode, so if you have questions, that's how to get in touch.

Yay for Google!

Sunday, April 19, 2009

SSD Anthology

My dad asked me a few questions about this article. In short, SSD drives suffer from performance degradation due to the fact that they are read from and written to in 4KB pages but can only be erased in 512KB blocks. This means that if you have a full block and want to change a single bit within it, the operating system sends the 4KB page which has been modified, but the SSD needs to erase and rewrite the 512KB block.

This "solution" sounds a lot like reinstalling Windows and all of your programs. ie. way more work and trouble than anyone other than a hard core geek would put up with.


Yup, you're right.

I'm waiting until they come up with an automatic solution. Something like "click 'yes' to speed up your drive"


In that case, you would want the drive to just do it automatically, because why wouldn't you want it to just automatically be as fast as possible? It would be like having a car which had an ignition and a separate button labeled "press this to actually start the car." If it were as simple and transparent as pressing a button, it should be done without bothering you.

As I said, the real problem is that the filesystem doesn't know enough about the structure of the flash drive to cooperate optimally. This is mostly a problem with existing filesystems which are organized for HDDs, but is also a problem of the flash drives which do not expose sufficient information to the filesystems for them to be able to make the best choices.

This is not just a case of one company or the other simply being stupid or lazy (though there is some of that). When flash drives were first introduced, there were no filesystems to take advantage of them since there was no demand for creating such a filesystem, and writing filesystems is REALLY HARD. Really, really hard. This means that you aren't going to create a new filesystem unless you have a really good reason for doing so, since, as I said, writing filesystems is REALLY HARD. The correct thing for the SSD drive manufacturers to do was to hide the underlying complexity from the filesystems of the day and use a strategy which was good enough when a filesystem treated the SSD as a hard drive. This gave rise to the fancy block reordering schemes described in the article.

The problem is that these reordering schemes run into the problems described by the articles (all of the blocks are used up and must be erased before new data can be written) which could be mostly solved (or at least largely mitigated) by filesystems which knew more about the structure of the SSD. Unfortunately, one of the factors over which SSD makers compete is their block reordering scheme, so they have an incentive to keep that a secret and prevent people from circumventing it (thereby making their fancy reordering scheme irrelevant). Taken to the extreme, the only field on which to compete would be the makeup of the memory chips themselves, which would push the different SSD makers to a more commodity status (and therefore lower margins). From a user's view, this would be an ideal scenario as long as your operating system and filesystem could take advantage of the additional information provided by the memory chips.

We are in a sort of transitional period during which there is quite a bit of flux and uncertainty. It is very, very useful to have a small number of filesystems which everyone can read and write, since it makes data portability that much easier and possible. This is the main reason (along with its minimal storage and computational overhead) why the otherwise horribly outdated FAT family of filesystems are still in widespread usage by removable storage. Superior alternatives exist (mostly on Linux, due to the relative ease of writing new filesystems for it), but aren't compatible with other operating systems, and so are useless for consumer flash drives. For a flash-aware filesystem to gain widespread support, it would need to have compatibility with the major operating systems, a catch-22 for any new filesystems.

The other hurdle, which is less visible to end-users, is the way the drives expose information about themselves. For a flash-aware filesystem to be truly effective, it would need some way of gathering information about the characteristics of the underlying flash drive and have direct access to it, bypassing any (or the most dramatic) block reordering schemes. The technical ideal would be to have a single open, royalty-free, well-written and extensible standard for performing this kind of communication. The problem is that allowing such a standard would bypass the competitive differentiator of a manufacturer's reordering scheme, which individual manufacturers would likely resist. If such a standard could be agreed upon, it would go a long way towards enabling better cooperation between the hardware and software.

Whether any of this comes to pass remains to be seen. What is fairly certain, however, is that there will be a lot of volatility in the SSD space as these issues (and more) are figured out.

Tuesday, February 24, 2009

Programming Project Resume

This is a list of my personal programming projects. Most of them were one-time projects to scratch an itch or play around with something and are not maintained.

  • A listing of most of my personal projects is available here.
  • I am the maintainer for smart tab, a small Emacs package to allow the tab key to automagically indent or complete, depending on the context.
  • My .emacs directory.
  • I am the webmaster for the Brown Ballroom Dance Team website. Notable features of the site include faceted video search of nearly 2000 videos (see an example), similar management of costumes owned by the team (viewable only by team members), and user profile management.
  • I worked on the Drupal version control integration libraries for the 2009 Google Summer of Code. The packages I worked on are here, here, and here.
  • I added support for many CCK fields to the Drupal Apache Solr integration project and refined the existing framework for supporting CCK fields.
  • I am working on Ezbl, a web browser based on the Uzbl browser project. It provides a glue between Emacs and Uzbl, forwarding commands and managing history and cookies.
  • I have contributed to package.el, the package manager for Emacs. Some of my work has been integrated into the development branch for Emacs 24, and I am working on an overhaul of the codebase.