Saturday, September 15, 2007

I Want it that WAE

Whew! I just now finished my very first programming language, WAE. That's right, I wrote a parser and interpreter from (sort of) scratch. Unfortunately it has nothing to do with the Backstreet Boys.

The part I'm not telling you is that it's for CS 173: Programming Languages and that everyone in that class did the same, but it is still big.

Out of all of this I have realized that it is good to get started on work early so you can start your Friday night earlier than 12:30 or so you can eat sometime between 4PM and then. Honestly though, as CompSci projects go, this one was not that bad. It was only about 6 hours in total and only 400 lines of code, which pasted into a typical word processor would be about 10 pages single-spaced or about 19 double-spaced. Obviously this is comparing apples to oranges a bit since a good number of my lines are blank. On the other hand, code (especially scheme) is much more dense than most prose.

Anyway, this wonderful language is pretty simplistic compared with something that people would actually use, but seeing as I banged it out in one 6-hour session, it's not bad.

Here is a very brief overview of how it works:

The syntax is similar to Scheme/Lisp and uses prefix notation with curly braces {}. This is not by mistake. Scheme has the wonderful feature that it can tokenize parenthesized strings merely by adding a single quote to the beginning. So this:

'(a b c)

would return a list with the elements "a" "b" and "c". This is amazingly useful because it allows me to skip the entire step of scanning, which is a bitch. Seriously, I spent a good part of my summer fighting with Java over whether I could actually use if for scanning.

Normally, I would show a "Hello World," but since this language doesn't support strings, that won't happen. Like I said, simple language.

Instead, I will have it add 1 and 1. Behold:

{+ 1 1}

To most people this looks weird, but it has some nice advantages. First, it is very simple to parse; secondly, it makes nesting very trivial like so:

{+ 1 {+ 2 3}}

That means "add 1 to the sum of 2 and 3."

It even has scoped variables!

{with {{x 1}
{z x}
{x 2}
{y 3}
{w x}}
{+ 3 {+ {- z w} {+ y x}}}}

The result of which is 7. See if you can figure out why!

If you are really interested in this and other languages, I highly recommend CSCI1730 by Shriram. You can take a look at his book (for free!) here.

Now off to Joe's!

Wednesday, September 12, 2007

Universal Serial Numbers

No, this is not a post about cracks or keygens for some software. It is an idea I have had kicking around in my head for a bit that needed to get out. The motivation for this idea is best captured in story form.

At the dawn of time, Man had a scant few possessions. The idea of assigning numbers to these possessions was absurd for a couple of reasons:

  1. He wouldn't have gotten very far past 1: Rock. 2: Wife. 3: Shank of meat.
  2. Numbers weren't invented yet.

As time went on, Man began to add things to his list of possessions: sticks, bigger stones, and eventually land! For most of human history, there were few enough things that you could just kind of remember all of them.

But what happens when you have thousands of items for sale in the same store? Should the clerk simply memorize the price of each and every different kind of spiced ham and make a tally of how many of each kind he has sold? If that's how things worked, the apocalypse would have occurred about 30 seconds after the opening of the first Wal-Mart.

What happened is that some guy, who I can only assume was named Joseph Bar, invented the Bar code to shift the tedium of matching an item to a product description to computers. Since then a number of different bar code schemes have evolved.

The invention of barcodes has also led to such things as barcodepedia, which lets you look up any barcode of a common format.

But what of other products? Network interface devices have MAC addresses, cars have VINs, people have passport numbers, books have ISBNs and so on. Wouldn't it be nice if all of these different identifiers could coexist peacefully in the same database?

That's what I am proposing. A unique identifier for each and every thing on planet Earth.

In the words of Keanu Reeves: "Whoa."

This idea would need a lot of fleshing out, and gets complicated quickly. Let's take the example of books for the sake of argument. Each individual book has a universal serial number. It does not need any more identifying information than that, no ISBN, no barcode, no Library of Congress catalog card number. If you had a graphical scheme for encoding it like a barcode, that would be the end of the story as far as what's printed on it.

Things get messy when you try to do something useful with this. So you have your lovely sha1 (why not?) string and you want to find some stuff out about it. Well, it would be nice to know what the product is. So if you have a copy of The Bourne Ultimatum, and all that's on the book itself is its unique hash code, how do you figure out that it is actually Robert Ludlum's smash hit? Well, let's invent a relationship between USNs (Universal Serial Numbers). Your individual copy could be a "member of the product line $FOO." So we add that to our database, publish a website with all of that and run off fat and happy.

But wait! We now know that the USN 173a18cf0b9835a0a0c67808ca20bb82a4c20dc7 is an instance of the book The Bourne Ultimatum and we could certainly store some info about the authors and publishing company. But we wanted everything. So the book might be "authored by $BAR," "published by company $BAZ," "edited by $FOB," and so on. Now we have tons of kinds of relationships which all have to be recorded.

What if we wanted to take it further? Say I have my copy of the aforementioned book but it gets lost or (gasp!) stolen? If I record somewhere that the book 173a18cf0b9835a0a0c67808ca20bb82a4c20dc7 belongs to me, I could help someone return it or prove that it was mine in the first place.

Here we hit the problem of privacy. Others may be different, but I certainly would not like everyone on the Internet to know about all of my possessions. One possible solution is to have the database work in a distributed manner, similar to DNS. That way if you looked up 173a18cf0b9835a0a0c67808ca20bb82a4c20dc7 in the global database, it would point you to my server, which would have its own rules for deciding how much to reveal. Because the keyspace is so large, there is a low probability of sniping keys and figuring out what someone has. There is far more to consider here, but again, this is a very rough draft.

So there it is. A rough idea which, given an enormous amount of work, could define and organize the relationships between all identifiable objects known to man.

Including his rock, his wife, and his shank of meat.