Crnchng th Brds

A Search for Universal Ornithological Identifiers

Philosophers of language and computer scientists have come to realize the importance of universal identifiers.For example Saul Kripke argues in Naming and Necessity that names are not descriptions, but rather "rigid designators" that retain their naming ability even in the presence of massive epistemological upheavals: "gold", he argues, would designate gold even if it turns out that gold is not a metal, is not yellow, and does not have atomic weight 34. Similarly, the use of unique identifiers to solve the problems of designating-by-description is widespread in modern computer systems.

I have often felt that there was a need for such unique identifiers for birding. To take a simple example, consider the problem of describing the birds that can be seen at a given site. A straightforward method would be to define some code for seasonal abundance like the ACUOR codes that are becoming standard, and then store one of those codes for every bird in some standard list of the birds of the world.

This straightforward approach suffers from two drawbacks. The first is that it is quite inefficient in terms of storage: the vast majority of the codes will be '----'. The more serious drawback is that it is tied to a particular bird list, and when that bird list is changed, as it invariably will be, the description of the site will no longer be valid‹what used to be bird 6,738 on the list is now bird 6,748, and making that change is quite cumbersome.

The use of Universal Ornithological Identifiers (UOI's) solves both these problems. The site data can be compactly stored as a list of UOI's along with abundance codes.

Criteria for Universal Ornithological Identifiers.

I recently set off to design a system of UOI's. Unfortunately, several mutually conflicting design goals come into play. The perfect UOI would be: I am unaware of any existing UOI system that comes acceptably close to satisfying these critera. The AOU system of codes is short and unique, but it is tied to the AOU checklist, is computed by arbitrary assignment, is not universal, and is not mnemonic. The system of taxonomic codes used by BirdBase is short, unique, and universal, but is tied to its taxonomic descriptions and is not mnemonic. The four-letter codes of H. Lee Jones are short, unique, taxonomy-independent, and mnemonic, but they are not easily computed and not universal.


My own OUI system, BrdBrev, takes a bird's binomen as its starting point. This is the closest thing we have to rigid identifiers for birds: they are unique, universal, mnemonic, relatively robust in the face of high-level taxonomic revisions, and widely known. The only problem is their length. The goal, then, was to find a method for shortening the Latin names to an acceptable length. In computer science terms, what was needed was a minimal perfect hashing scheme for the Latin names. Now the art of finding minimal perfect hashing functions has advanced remarkably in the last few years, and it may actually be possible to find one, but the result would almost certainly violate the requirements for mnemonicity and easy computability. I chose instead to see how close I could come with relatively simple abbreviation schemes.

The two main abbreviation techniques are deletion of frequent letters and substring deletion. For example, Yllw-trtd might be derived from Yellow-throated by frequent-letter deletion, while substring deletion might yield Yell-thro. I tried a large number of combinations of these techniques.

The one that seemed to work best was a 9-character code using frequent letter elimination, with capitalization used to preserve segmentation. This scheme has a phenomenally low figure of only six collisions in the entire list of almost 10,000 birds. The six collisions are:

{6 Using capital to denote segmentation:

1 ChlbnMugu    3073    3078
2 PhylFlvvn    4877    4884
3 OluXnthnu    5935    5950
4 CduelPinu    8749    8750
5 BlutuGntu    8948    8962
6 AmbyHlecu    9605    9652