Monday, March 1, 2010

Comparing Emacs Version-parsing Libraries

Recently, I've become interested in advancing the (currently sad) state of Emacs package management. It is a well-documented problem (see here for more detail), so I won't discuss it generally.

One important aspect of a package-management system is the ability to deal with the different versions of a package. Because of the current state of packaging, there is no real standard format for version numbers. Any potential package manager must be able to figure out the version of a given package without being told it explicitly by a user (otherwise maintaining the packages would be way too hard). Currently, package.el has a simple-ish way of dealing with version numbers: it uses the lisp-mnt library (which comes with Emacs) to pull out the Version or Package-Version headers, strips RCS info (some version numbers are like $Id: linkd.el,v 1.63 2007/05/19 00:16:17 dto Exp dto $, and we really only want 1.63), and then split the result by periods and convert the pieces to integers. This means that it only works for versions like 1.2.3 and not 1.2.3alpha.

It would be nice if we could get everyone to use only dotted-numeric version numbers, but that's not happening any time soon. Instead, a package manager must be able to make sense of more complex version numbers, such as 6.34a (which is what org-mode uses), 1.0pre7 or (cedet). I'm going to look at the following three version parsing solutions:

  • version-to-list, included in Gnu Emacs.
  • inversion, from CEDET, now included in Gnu Emacs (not sure which version).
  • vcomp Written by Jonas Bernoulli, creator of the Emacs Mirror

I'll take a bunch of examples and show the output that each one produces. If you have any suggestions for additional version formats, please let me know :)

  • "1.0pre7"
    version-to-list
    (1 0 -1 7)
    inversion-decode-version
    (prerelease 1 0 7)
    vcomp--intern
    nil

  • "1.0.7pre"
    version-to-list
    (1 0 7 -1)
    inversion-decode-version
    nil
    vcomp--intern
    nil

  • "6.34a"
    version-to-list
    (6 34 -3)
    inversion-decode-version
    nil
    vcomp--intern
    ((6 34) (104 0 96 0))

  • "1.3.7"
    version-to-list
    (1 3 7)
    inversion-decode-version
    (point 1 3 7)
    vcomp--intern
    ((1 3 7) (104 0 96 0))

  • "1.0alpha"
    version-to-list
    (1 0 -3)
    inversion-decode-version
    (alpha 1 0 1)
    vcomp--intern
    nil

  • "1.0PRE2"
    version-to-list
    (1 0 -1 2)
    inversion-decode-version
    (prerelease 1 0 2)
    vcomp--intern
    nil

  • "0.9alpha"
    version-to-list
    (0 9 -3)
    inversion-decode-version
    (alpha 0 9 1)
    vcomp--intern
    nil

  • "2009.04.01"
    version-to-list
    (2009 4 1)
    inversion-decode-version
    (point 2009 4 1)
    vcomp--intern
    ((2009 4 1) (104 0 96 0))

  • "2009.10.5"
    version-to-list
    (2009 10 5)
    inversion-decode-version
    (point 2009 10 5)
    vcomp--intern
    ((2009 10 5) (104 0 96 0))

  • "20091005"
    version-to-list
    (20091005)
    inversion-decode-version
    nil
    vcomp--intern
    ((20091005) (104 0 96 0))

  • "20091005pre"
    version-to-list
    (20091005 -1)
    inversion-decode-version
    nil
    vcomp--intern
    nil

  • "20091005alpha"
    version-to-list
    (20091005 -3)
    inversion-decode-version
    nil
    vcomp--intern
    nil

  • "20091005alpha2"
    version-to-list
    (20091005 -3 2)
    inversion-decode-version
    nil
    vcomp--intern
    nil


Looking at this, it seems like version-to-list is the way to go, as it handles the different possibilities better than any of the other functions.

7 comments:

  1. I think that all cases where vcomp--intern returns nil can be fixed by adjusting the used regex slightly:

    currently 0.1alpha1 returns nil but 0.1_alpha1 does not. vcomp allows similar version strings as was is used by gentoo to version their ebuilds (packages). Note that they do not require upstream to use the same scheme, it's just what they use for their sanitized version strings in ebuilds.

    vcomp does not support the gentoo version string completely; this would not be valid 1.0_alpha_beta1_p12-r1 though in gentoo it would. (I think, can't find link to their documentation right now).

    Also note that vcomp--intern really is only for internal use, explaining the strange numbers it does return.

    ReplyDelete
  2. Which doesn't mean it's the best solution, just that it can also be extended quite easily.

    ReplyDelete
  3. Just noticed that I had this as a todo anyway:

    ;; TODO: Do not require "_" before "alpha". Good idea?

    ReplyDelete
  4. Makes sense. This was mainly an experiment for myself to see which system made the most sense to use going forward. I think that for now, I'm going to see about retrofitting `version-to-list' into elx to cut down on the number of different version parsers out there.

    ReplyDelete
  5. Yes it would be nice to have just one implementation. Meanwhile hoping my work was not all useless, I have added support for things like 2alpha1 (and even 2aalpha1, but I hope not to see that anywhere) and fixed an embarrassing bug.

    ReplyDelete
  6. This comment has been removed by the author.

    ReplyDelete
  7. Inversion uses an alist of regular expressions to parse the version numbers, as does version-to-list. It wouldn't be hard to make either support a wider range of expressions.

    While I find it tempting to use version-to-list to replace inversion internals now that I know it exists, this is a recent edition to Emacs that would reduce the compatibility of inversion across Emacsen. :(

    ReplyDelete