This Week in Hax: June 2009

Friday, June 26, 2009

Fully Homomorphic encryption

This was originally a response to an email from my dad, but I've turned it into a blog post. The original story is on Slashdot and the abstract for the paper is also available.

This is great. We can then store all our data on NSA computers, access it anywhere, and not care that they can read the unencrypted data. In fact, I am sure they have high quality back up.

I think you are being sarcastic, but the idea behind the system is that if it works correctly (which is something that the NSA might have figured out a way around), you would never actually send the NSA your unencrypted data.

Interesting that IBM would push this as a feature of cloud computing.

Well, the big idea is that this lets you overcome one of the classical big problems with cloud computing, namely the hesitation associated with giving anyone (even a "super-duper-trustworthy person") all of your private data. This way, you get the advantage of letting someone else deal with your IT costs (where they can apply more specialized expertise and take advantage of economies of scale) without having to give up confidentiality of your data.

IBM could build a bunch of big datacenters, and then have people pay to host their applications and data in the cloud, secure in the knowledge that their data is safer than with a traditional cloud.

After all, this could be valuable without ever resorting to a cloud.

Absolutely. Assuming this holds up to scrutiny (which is certainly not a guarantee), this could be one of the biggest advances in cryptography in decades, perhaps even since the invention of public-key cryptography (which enabled https, among many others). It would allow computers to process information they know nothing about, including down to the level of the processor. That is, the processor itself can't decrypt the data on which it is operating.

Suppose your accountant was working on the company's books. The laptop could have encrypted data, and the accountant could do all their work without ever having an unencrypted copy on the computer.

Exactly. Or think of a database admin who has to ensure that the database stays running efficiently, but shouldn't be allowed to see the people's encrypted data.

Even the output file could be encrypted (I assume).

Yes, that is the whole point. The client takes some encrypted data and a program and produces an encrypted version of that data and a modified program which, when run on the encrypted data, will produce the same result as running the original program on the unencrypted data and then encrypting it.

So I fill out my tax form, send it to some company, they do a computation, and give me an encrypted tax return, but they never get to see any of my private data. The result only gets decrypted on my computer.

If someone stole the laptop they would have useless files.

Exactly, and that could be a big boon as well. Not only do you not have to worry about lost laptops revealing information, but that info never needs to make it to those companies in the first place.

But IBM has to find a way to say this is marketable. Right now the advantage of the cloud to a commercial entity has to be its capacities: High volume storage, data integrity, security, or analysis expertise. By offering to do at least some things without ever seeing the data the big company gives you a reason to let them have your data. Or, I suppose, one could say "they have your files, but they do NOT have your data"

Exactly. They will say, "running a massive supercomputer is HARD! Let us do that for you and sell you some time and space on the server." It is like the old time-sharing days.

I think that if the following three things happen, this could be earthshattering:

The system turns out to be secure.

The overhead (both time and space) of the system isn't vastly higher than the unencrypted version; say, less than an order of magnitude slower.

IBM resists the temptation to patent the idea.

1) is important for obvious reasons; if the system isn't secure, it is just a fancy waste of time. If doing the fully homomorphic encryption is much, much slower then this won't see much use outside of specialized applications. 3) is important because putting a patent on the system which prevented or impeded the development of alternative implementations and usages would prevent it from becoming a universal standard, able to replace legacy systems. The temptation to patent and hold onto it will be high, as it could be a large competitive advantage, but it could end up being much more useful to the world in general if it became ubiquitous.

Sunday, June 21, 2009

Weekly Update 4: Completion of Version Control Integration

Whew! This has been a busy week! I did not get the rules integration completed as I had hoped, as the Git hooks ended up taking a lot longer than expected (and are still not yet done). The problem is that access checking under Git is MUCH more difficult than under SVN for a few reasons:

SVN calls the pre-commit hook once for each commit (duh), but Git only calls the update hook once per push (per brach or tag).
Figuring out which commits to check is non-trivial, and can involve a number of different conditions.
The author of a SVN commit is the same as the authenticated username of the person uploading the commit, so access checking is easy. In Git, the author, committer, and pusher can be totally different people, and Git (correctly) does not store any information about the pusher.
Passing the appropriate authorization information to the hooks is non-trivial.

Definitions/Clarifications

There are a couple of complex ideas in there, so I'll take a moment to define what I'm talking about.

Once per push

With DVCSs, commits happen on the user's local computer, so Drupal (obviously) cannot check commits until they are pushed to the server repository. What this means is that Drupal sees a whole bunch of commits at a time.

Author, Committer, Pusher

This is a distinction that does not exist within centralized VCSs, as there is only one way for a commit to enter the repository. In Git, and I believe the other DVCSs as well, there is a difference between the author of a commit, the committer, and the person who is pushing to a repository. For the purposes of demonstration, I will call the author Alice, the committer Carol, and the pusher Pat.

Author: The person who originally wrote the commit. This gets set when you run "git commit", and does not change as a commit floats around between repositories.
Committer: The person who added the commit to the repository. By default, is the same as the author. It will be different if, for example, Alice emails her patch to Carol who commits it to her repository. In Carol's repository, Alice will be the author and Carol will be the committer. If Carol emails the patch to Charlie and he commits it, then Charlie would be the new committer.
Pusher: The person who pushes a commit to a remote repository. It is this person who needs to be authorized in order for a push to succeed. It doesn't much matter who a commit was written by, as long as the person adding it to the mainline repository is allowed to do so.

In the original examples, Alice writes a commit, mails it as a patch to Carol, who then asks Pat to upload it to drupal.org. Pat has an account on drupal.org, but neither Alice nor Carol do. Pat pushes the patch to the main repository on drupal.org, and the push succeeds because Pat is allowed to push.

With the current workflow on drupal.org, Alice would post a patch as an attachment on a bug, Carol would mark the patch "reviewed and tested by the community," and Pat would commit the patch to CVS.

Authenticated username

Since no mention of Pat is included in the commit he is pushing, some method external to Git is needed to determine whether a push should be allowed.

Solutions

So far, I have only implemented solutions to 1 and 2, though the way to progress forward on 3 and 4 is now much more clear (after this week's IRC discussion).

Which commits to check

Figuring out which commits to check can be tricky, since an updated ref could be a fast-forward (nothing but a linear set of commits between the old and new location of the ref), a non-fast-forward push (such as a rebase), or the creation of a branch (so the "old commit" is 0000000000...). Additionally, if a ref is changed, but does not introduce any commits, then no commits need to be checked. This will occur if, for example there are three branches, "master", "next", and "test", where "test" and "next" point to the same commit. If "test" is changed to point at "master", then no commits are actually added to the repository, so the only check should be whether the user is authorized to modify branches. This adds complexity which is not present in the SVN backend.

I have a draft implementation of this logic, but it needs to be tested. I am working on the tests, which will include a set of sample repositories and push different sets of commits to ensure that the correct set of commits is tested.

Authentication and Authorization

The solution I came up with on IRC was to mimic the behavior of programs like InDefero and gitosis by using ssh forced commands to associate usernames with ssh keys. Here is the steps the control flow will take:

user runs git push
User connects to server via ssh. All users connect through the common user git.
ssh server looks up user's key in .ssh/authorized_keys and sees that there is a command= property on that user's key
The value of command= is run, which would be something like git-serve.php .
git-serve.php checks whether there is a drupal user with (or who has a vcs account username) and if so, sets an env variable GIT_DRUPAL_USER_NAME.
git-serve.php grabs the value of the env var (which was set by ssh) SSH_ORIGINAL_COMMAND, which will be git receive-pack and runs that (if step 5 passed).
git receive-pack runs the update hook once for each branch or tag being updated. It gets the user name from GIT_DRUPAL_USER_NAME.
The update hook builds $operation, $operation_items for each commit being added (using the steps described earlier) and sets the author of $operation to GIT_DRUPAL_USER_NAME.
If any of has_write_access($operation, $operation_items) fails, then that ref update is refused.

It is complicated, but it should be doable, and won't actually be all that much code (since ssh handles a lot of it).

Friday, June 19, 2009

Researchers conclude piracy not stifling content creation

An interesting look by Ars Technica at the effect (or lack thereof) of piracy on music creation.

I have long said that the purpose of copyright law should (and was, you know, like in the Constitution) be about "promoting the progress of science and the useful arts" and only incidentally about supporting artists and their families.

I don't care about the livelihood of artists. I really don't, and neither should you. I mean, people not starving to death is always cool, but it doesn't really affect me if some random musician gets paid or not. What I do care about is that new music that I like gets produced. If new music that I like is being produced, as long as it is not created by child slavery or prostitution or something, I really couldn't care less whether the artists are granted monopolies on their creations or the music is created on communes where everyone is forced to use the communal toilet to fertilize the glorious communist fields.

Congress only has the power to "promote the progress of science and the useful arts," and if the current copyright system fails to do this, as this and other articles argue, then that system is unconstitutional.

CNNfail and expectations of news networks

This started as a reply to my dad about this article, but got long enough that I decided to make it a blog post.

Interesting read, but I would frame it differently. I gave up on CNN a long time ago.

I don't think I ever gave them a serious chance, and having watched the Daily Show for more than a few episodes, I feel entirely justified in this conclusion.

It seemed that they had concluded, as a business decision, that there was not viewership for news.

Or at least, that it was more cost-effective to report on Paris Hilton than to report news.

Sure, sending reporters around the world is expensive, but it really is not necessary. You gain something from boots on the ground, but what you really need is the determination to discuss the news.

The interesting thing to me is just how much you can actually report without boots on the ground at all. As I have said to Evie a number of times when discussing the future of news, "international news is just local news in a different place." One thing that we are clearly seeing is that it is very possible for the "new media" to have detailed, up-to-date, and accurate accounts of events happening on the other side of the world.

The "old media" system relies on a small number of people dedicating all of their time on all of the news, but the new media allows a large group of people each to focus on one event or category and thoroughly report on just that one topic. For example, the guy who put together the Tatsuma site probably does not also cover the tax breaks for green roofs in New York. If he were a TV channel, that would be a problem, since you couldn't rely on him for your sole source of news. In a new media setting, you only go to him for info about the Iran elections, and go to someone who devotes their whole day/month/life to green roofs in New York. Everything is important to someone, and if it isn't, it likely isn't newsworthy.

My counter example" my beloved CNBC. Now it is only business and financial news. However, they really do B and F news. In depth, in detail, intelligently. No Paris Hilton, no jokes, no pointless chatter among the hosts.

Given the trends among other TV "News" networks, this is surprising and encouraging.

It looks like it should be very cheap to produce. They have several people in the studio, and they report the news. Much of the time is devoted to interviewing people who know what they are talking about (senior executives at corporations and securities analysts), and giving them time to answer thoughtful questions. I have seen them spend 20 minutes discussing interest rate policies and its effects on financial companies with the head of a major insurer.

I'm not that into the details of B&F news, but I really wish that this type of news network existed for other areas as well, and I'm finding that more and more of this kind of thing is done on blogs and social media.

If CNN were to have someone like that on at all, they would ask some idiotically simplistic question, then cut them off after the first 15 seconds of the answer.

And then have the host blather on about some generic topic that only tangentially related to the topic and finish off with a half-hour of soundbites.

CNN could do this. But they have decided that Brittny Speers's haircut is more newsworthy.

I think the problem, and the most profound part of the article, is that there exists this implicit idea with people who complain about this kind of thing that there is some sort of "social contract" between the "news" networks and the people that the news companies will report on what is relevant and important. The reality is that these companies are businesses, and so will do whatever they can to maximize profits. I think that the problem arises when the networks advertise themselves as the former but deliver the latter. People don't get outraged when E! reports on Britteny Spears' haircut because they make no claim to be anything but a gossip column about celebrities.

The technological and social frameworks for a complete replacement of these old media businesses are not quite yet in place, but they are close, and when they are, the entertainment companies posing as news are going to be in trouble.

Wednesday, June 10, 2009

Weekly Update 2: Completion of Version Control Integration

A little late this week, as I was moving into my summer apartment.

The biggest (and really only) news is that I finished the Subversion hooks, and (slightly) improved the backend in general. It is now possible to deny commit access to users based on whether or not they have a Drupal account with that particular Subversion repository. Branches and tags, however, are not supported, as the Subversion backend does not support them. There is the framework for doing so, as the repository creation/edit form accepts path patterns for trunk, branches and tags, so adding support should not be too difficult.
The post-commit hooks allows Drupal to gather log information as each commit is made, rather than on a cron run. There isn't much else to it, but it can provide for a bit more immediacy of actions, and in the future, could be an event for the Rules module (something I will work on later in the summer).
Both of the hooks come with a set of SimpleTests for ensuring their correct functioning. The tests take a little while to run, as creating, checking out, and committing to a subversion repository take a while (and don't play that nice with the OS-level filecache, since new objects are being created each time), but are helpful for verifying that everything is in working order. They test both positive and negative cases, making sure the hooks fail gracefully when passed invalid information. They currently all pass (as one should hope !), and are able to catch a decently wide range of errors in the operation of the hooks.
A minor aside, but it bears mentioning, since I spent several hours testing it, but I discovered that Subversion allows any UTF-8 character for any string (comments, user names, and files). However, certain tools restrict especially the username to a subset of those characters, depending on the configuration options for that program. For example, the svnserve 'passwd' file format uses brackets and equals signs as part of its syntax, so those characters are prohibited in entries in that file, thus preventing Subversion users from having those names. However, if authentication is done in a different way, such as with ssh keys, this restriction no longer applies.

Well, that's all for now, stay tuned next week as I complete the Git repository hooks.

Tuesday, June 2, 2009

Weekly Update 1: Completion of Version Control Integration

This week was not a terribly busy one, with most of my time spent getting acquainted with the APIs of Drupal, the versioncontrol module, and Subversion. I was able to complete the pre-commit hook for Subversion, as well as the basic configuration for the other Subversion hooks.

I also identified a number of areas in which the SVN backend needs more work, such as recognizing branches and tags. The framework is in place, as the repository creation form includes fields for the branch and tag patterns, but currently no code makes use of those fields. Subversion is a tricky case, as it has no native concept of branches or tags, so the branch of a commit depends on the path of the files it modifies in the repository.

With the prep work out of the way, I hope to push forward to complete the Subversion hooks and begin on the next step of the project, the hooks for Git.