Monday, April 21, 2008

On Voting Systems and Hash Trees

So after reading this article, I started thinking a bit about the implementation side of some of our voting problems.

First, let's think about requirements. A good voting system should be:

  1. Private, in that nobody can force you to reveal who you voted for. This is important because it prevents someone from threatening a voter for voting for a certain candidate. Also, if someone publicly states that they support one candidate but choose to vote for another, it could be damaging to their reputation to have it exposed that they voted differently. Who someone votes for is their private business.

  2. Auditable. There must be some way to prove that the votes cast are the same ones counted for deciding who is president (or Governor, or senator, etc.).


  3. Verifiable for individuals. If I suspect that my vote was altered, I should be able to check that the vote that I made has not been modified before being counted.
The traditional way of using paper ballots works to a point, but it is possible to modify paper, create fake ballots, and of course, the infamous "dangling chad" fiasco.

Thus, I have spent some time brainstorming a cryptographically secure system using public keys and hash trees.

There would be a master key pair for the whole election, owned by the federal body in charge of handling elections. Each state would have its own key pair, as would each polling station and each machine. All of the state public keys would be signed by the federal key, each state would sign the keys of all of the polling stations, and the polling stations would sign the keys of each machine.

When a person then votes, a hash of their vote (optionally with some random value) is signed by the voting machine's key and by the polling station's key. Before the user votes at a station, they can verify that the state voting authority actually signed the key of the polling station, and then verify that the polling station signed the key of the voting machine. Assuming the federal key could be accurately transmitted to everyone, which could be done over a number of very public channels, the user could verify all of the keys down to that of the individual machine at which they are voting.

The purpose of this is so that if the user suspects that their vote was not counted or was manipulated, they can go to the polling station or state court and prove that a polling station authorized by the state registered their vote. This would prevent people from being lured to fake polling stations which simply threw away their vote.

Also, to ensure that the state received the vote, as part of the process the user's vote could be sent (securely) to a computer owned by the state voting group and a signature sent back to the user. This way, the user would have proof that the state received the vote that the user intended.

The next problem is allowing users to force the government to prove that the vote used for counting is the same as the vote that the user made. This could be done with a hash tree. Once each state had collected all of the votes, it would assemble a hash tree using the earlier hashes of votes as the leaves. The state would then sign and publish the root hash. If a user wanted to verify that the government actually used his or her vote in the final count, the user would ask for a validation of their hash, which would require the government transmit only a logarithmic number of hashes to the user. If these hashes matched the public root hash, then the user would know that his or her vote was accounted for.

When the user goes to a voting station, they would take with them a copy of the keys used by both the federal government and the state. The user would leave with a secret random number which, with their vote, could be used to reconstruct the user's vote hash as well as signatures from the voting machine, polling station, and the state. At home, the user could manually verify that their vote hash was correctly calculated, as well as verify the signatures of each of the authorities who signed the user's vote hash.

The only issue is the one of the random number. Since there are so few choices in an election, without any padding, most hashes would be the same. If the options were "I vote for McCain" and "I vote for Obama", then using the SHA1 algorithm, there would be only two different hashes:

"I vote for McCain"
2647414cd4c7769b05fcb68c9b6dde321fdaa49c

"I vote for Obama"
65d5d0d9855878e256a75c8c7fce688b4877c193


Someone could then simply look and see which hash was given and determine who the user voted for. The solution typically used for ensuring the authenticity of messages is to add some random data (that is stored for validation) to the message. A simple example is to include the current time:

"I vote for Obama at Mon Apr 21 17:52:07 EDT 2008"
e194810aa658f103e5359a985fd35e722eb3e857

"I vote for Obama at Mon Apr 21 17:52:49 EDT 2008"
bcf9b0934b562621b8a861ce8eb427c2fa99da16


This could be supplemented with additional data to reduce the likelihood that two people voting for the same candidate would wind up with the same hash, since it is easy to imagine that two people in a country of 300 million could vote for the same candidate in the same second.

There is still the problem of hiding the vote though. The user needs that data at the end of the day to be able to verify that the hash correctly identifies the user's vote, but if an adversary had the random data and the hash, they could figure out the user's vote as easily as before. I don't have a good solution for this yet, one possible option is giving the user a hash of the random data, and mailing the actual value to the user separately, but that is cumbersome.

Anyway, I doubt that this system will actually be used in a real voting system. It is fairly simple to implement and could be done using existing standards and free, open source software with a minimal amount of additional GUI code, but it would be fairly difficult to have a multi-million dollar business around this and therefore tough to bribe the required officials into mandating an insecure and fundamentally flawed system.

Maybe next week, I'll implement this whole thing in a under 1,000 lines of Ruby.