Sunday, June 21, 2009

Weekly Update 4: Completion of Version Control Integration

Whew! This has been a busy week! I did not get the rules integration completed as I had hoped, as the Git hooks ended up taking a lot longer than expected (and are still not yet done). The problem is that access checking under Git is MUCH more difficult than under SVN for a few reasons:

  1. SVN calls the pre-commit hook once for each commit (duh), but Git only calls the update hook once per push (per brach or tag).
  2. Figuring out which commits to check is non-trivial, and can involve a number of different conditions.
  3. The author of a SVN commit is the same as the authenticated username of the person uploading the commit, so access checking is easy. In Git, the author, committer, and pusher can be totally different people, and Git (correctly) does not store any information about the pusher.
  4. Passing the appropriate authorization information to the hooks is non-trivial.

Definitions/Clarifications

There are a couple of complex ideas in there, so I'll take a moment to define what I'm talking about.

Once per push

With DVCSs, commits happen on the user's local computer, so Drupal (obviously) cannot check commits until they are pushed to the server repository. What this means is that Drupal sees a whole bunch of commits at a time.

Author, Committer, Pusher

This is a distinction that does not exist within centralized VCSs, as there is only one way for a commit to enter the repository. In Git, and I believe the other DVCSs as well, there is a difference between the author of a commit, the committer, and the person who is pushing to a repository. For the purposes of demonstration, I will call the author Alice, the committer Carol, and the pusher Pat.

  • Author: The person who originally wrote the commit. This gets set when you run "git commit", and does not change as a commit floats around between repositories.

  • Committer: The person who added the commit to the repository. By default, is the same as the author. It will be different if, for example, Alice emails her patch to Carol who commits it to her repository. In Carol's repository, Alice will be the author and Carol will be the committer. If Carol emails the patch to Charlie and he commits it, then Charlie would be the new committer.

  • Pusher: The person who pushes a commit to a remote repository. It is this person who needs to be authorized in order for a push to succeed. It doesn't much matter who a commit was written by, as long as the person adding it to the mainline repository is allowed to do so.

    In the original examples, Alice writes a commit, mails it as a patch to Carol, who then asks Pat to upload it to drupal.org. Pat has an account on drupal.org, but neither Alice nor Carol do. Pat pushes the patch to the main repository on drupal.org, and the push succeeds because Pat is allowed to push.

With the current workflow on drupal.org, Alice would post a patch as an attachment on a bug, Carol would mark the patch "reviewed and tested by the community," and Pat would commit the patch to CVS.

Authenticated username

Since no mention of Pat is included in the commit he is pushing, some method external to Git is needed to determine whether a push should be allowed.

Solutions

So far, I have only implemented solutions to 1 and 2, though the way to progress forward on 3 and 4 is now much more clear (after this week's IRC discussion).

Which commits to check

Figuring out which commits to check can be tricky, since an updated ref could be a fast-forward (nothing but a linear set of commits between the old and new location of the ref), a non-fast-forward push (such as a rebase), or the creation of a branch (so the "old commit" is 0000000000...). Additionally, if a ref is changed, but does not introduce any commits, then no commits need to be checked. This will occur if, for example there are three branches, "master", "next", and "test", where "test" and "next" point to the same commit. If "test" is changed to point at "master", then no commits are actually added to the repository, so the only check should be whether the user is authorized to modify branches. This adds complexity which is not present in the SVN backend.

I have a draft implementation of this logic, but it needs to be tested. I am working on the tests, which will include a set of sample repositories and push different sets of commits to ensure that the correct set of commits is tested.

Authentication and Authorization

The solution I came up with on IRC was to mimic the behavior of programs like InDefero and gitosis by using ssh forced commands to associate usernames with ssh keys. Here is the steps the control flow will take:

  1. user runs git push

  2. User connects to server via ssh. All users connect through the common user git.

  3. ssh server looks up user's key in .ssh/authorized_keys and sees that there is a command= property on that user's key

  4. The value of command= is run, which would be something like git-serve.php .

  5. git-serve.php checks whether there is a drupal user with (or who has a vcs account username) and if so, sets an env variable GIT_DRUPAL_USER_NAME.

  6. git-serve.php grabs the value of the env var (which was set by ssh) SSH_ORIGINAL_COMMAND, which will be git receive-pack and runs that (if step 5 passed).

  7. git receive-pack runs the update hook once for each branch or tag being updated. It gets the user name from GIT_DRUPAL_USER_NAME.

  8. The update hook builds $operation, $operation_items for each commit being added (using the steps described earlier) and sets the author of $operation to GIT_DRUPAL_USER_NAME.

  9. If any of has_write_access($operation, $operation_items) fails, then that ref update is refused.

It is complicated, but it should be doable, and won't actually be all that much code (since ssh handles a lot of it).