The Mathematical Theory of Listing

Any birder seriously interested in increasing his or her lifelist is frequently confronted with the question: "How may I best allocate my limited resources so as to maximize the number of new species on my list?" This article presents a mathematical model which allows one to answer this question, and a computer program, OrniMax, which applies the model to cencrete situations.

In what follows, the theory is explained, for simplicity's sake, in terms of life birds, but it should be obvious that the same procedure can be applied to trip lists, year lists, state lists, or any other subset of birds in which a birder is interested. The mathematical formulations have been supplied, but they should not blind the non-mathematical reader to the broad outlines of the theory, nor to its practical applications.

The mathematical theory of listing must address itself to three separate but interrelated questions. The ultimate question is: "Where should I go to get the greatest number of lifers?" This question can be answered only if, given a specific itinerary, one can answer the simpler question, "How many lifers may I expect to get if I make this trip?" And this in turn can be known only if one can answer, "What is the probability that I would see species j if I spent h hours birding at site i?" We shall take up these three questions in reverse order.

1. Species visibility. We must be careful to distinguish between the population density of a species, measured in individuals per square kilometer, and its visibility, which may be defined as the probability that a competent party would see the species in one hour of birding at a given site in a given season. We shall represent the visibility of species j at site i as vij. Empirically, visibility can be measured as the total number of party-hours in which the species has been observed divided by the total number of party-hours at the site. For example, if 100 parties were each to spend one hour at Chincoteague on February 14, and 37 of them were to see at least one Surf Scoter, then the visibility of that species at that time and place would be estimated as 0.37.

Clearly, to start from scratch and empirically measure visibility for every species and every site at every season would be an undertaking as mammoth as it would be unnecessary. For the purpose at hand it is sufficient to estimate visibility from information already available. One possibility would be to use statistics on individuals per party-hour gleaned, for example, from the Christmas Counts. Indeed, if one could only assume a normal distribution of individuals/party-hour around the mean (see Figure 1-a) then it would by a simple matter to calculate visibility using standard deviations and z-scores. Unfortunately it seems likely that for many birds the distributions will in fact be more like those shown in Figures 1-b and 1-c, so that the assumption of normality is unwarranted.

A more satisfactory basis for estimating visibility is to be found in checklists which give the relative abundance of each species in each of the four seasons. (Obviously the same techniques may be applied with even greater success to bar-graphs, but we shall not discuss this special case here.) In many cases, such checklists already define relative abundance in probabilistic terms ("abundant = hard to miss," etc.) so that it is sufficient to assign specific quantitative values to the checklist terms. This is done through a limited series of field trials in which actual visibility measurements are made; the quantitative value for each abundance category will then be taken as the average value for the birds in that category. If for example the average visibility of birds listed as "abundant" turns out to be 0.45, then this would be taken as the quantitative probabilistic interpretation of "abundant."

The seasonal checklists, as we all know, may be quite misleading for those of us who travel at the beginnings or ends of migrations, so OrniMax introduces a refinement. For each site under consideration, approximate migration dates are read in; if a trip is made during one of the transition periods between seasons, then the visibility of each species is adjusted using the cosine of the date, producing the smoothed-out curve shown in Figure 2. That these migration dates correspond exactly to those of no individual species is of little importance if they are roughly descriptive of the migration as a whole, since errors will cancel each other out. It may ultimately be desirable to introduce a correction for known late or early migrants.

The knowledge of a species' visibility as we have defined it would be almost useless unless it were possible to adjust the time period involved, so as to know, for example, what the probability of seeing a given species in fourhourse of birding would be. If one could assume that whether a party sees a species in a given time period is independent of whether or not it has seen that bird in the preceding time period, then it would be a simple matter to apply the multiplication law of statistics to calculate probabilities over varying lengths of time. Unfortunately, such statistical independence may not be assumed when dealing with periods of more than a few hours at most. The reason is that visibility is based on a long-term average, and may be thought of as the probability of seeing a species during the first hour at a given site. In any concrete situation, factors such as irregularities in bird distributions, inaccuracies in the checklists, geographic variation, weather, and the like modify the likelihood of seeing a species, so that after the probability of seeing it in the 50th hour of birding after 49 unsuccessful hours is smaller than the probability of seeing it in the first hour. For this reason, OrniMax exponentially reduces visibilities as a function of time spent at a site. This yields the following formula for calculating the probability of not seeing a bird j in hi hours at site i:

where d is a constant reflecting the degree of dependence from one time period to the next. To illustrate with a dependence factor of 0.9, if the probability of seeing a bird is 0.3 in the first time period, it will be 0.27 in the second time period; the probability of missing it in both time periods is thus (1-0.3) (1-0.27) = (0.7) (0.73) = 0.51. The probability of seeing the bird during the extended period is simply the complement of the above expression: pj(hi) = 1 - qj(hi).

2. Estimating lifers on a given field trip. The preceding results have put is in a position to express by a straightforward formula the number of lifers to be expected on a given field trip:

where n is the total number of possible lifers and m is the number of sites visited on the field trip. For illustrative purposes, suppose we visit only two sites, spending four hours at one and eight at the other. Let us further suppose that we have 100 possible lifers, and there is a 70% chance of seeing each of them in four hours at the first site, and a 50% chance of seeing each of them in eight hours at the second site. Then we may expect to see a total of 65 lifers during our visit to the two sites. This formula assumes of course that seeing a bird at one site is an independent event from seeing it at another site; within limits, this assumption seems warranted.

3. Choosing an itinerary. With the above formula in hand, we are now ready to return to the problem with which we began, that of maximizing the number of lifers on a field trip. In mathematical terms, we are seeking to find a series of values h1, h2, ... hm such that (a) the sum of these values equals the amount of time available for birding, and (b) the function L is at a maximum for this series of values.

There is no direct method for calculating this series of values, so OrniMax uses an iterative, indirect approximation which for practical purposes is quite as satisfactory as a direct solution would be. The algorithm is basically as follows. A trip list is set up containing the probabilities of seeing each species on the trip, and these probabilities are initialized to 0. The trip is divided into a number of segments. For each segment, the program calculates the total change in the expected number of trip lifers which would result from spending that segment at each of the m sites under consideration. On this basis, the site producing the greatest increase in trip lifers is chosen. The trip probabilities are adjusted to reflect the time spent at that site, and then the entire process is repeated until the time available for the trip is completely allocated. (Needless to say, the sites need not be visited in the order they are chosen; all that matters is the relative amount of time spent at each.)


*****PROGRAM ORNIMAX 91/06/08  *****

Date of trip: 5/25
Number of hours available for birding: 32
Budget: 100.00
Trip planned for: David
Probability Definitions for 4 Hours:
------------------------------------
Absent        0        Uncommon        0.077632
Accidental    0.000312 Common          0.427102
Rare          0.003116 Abundant        0.683594
Occasional    0.011946

Dependence Factor: 0.92237

SITE NO.   SITE NAME           COST           FACTOR
1          Atascosa            2.00           -0.25
2          Santa Ana           2.00           -0.25
3          Anahuac             1.00           -0.25
4          Bentson-Rio Grande  3.00           -0.25

ITERATION   SITE     POINTS       CHANGE
1           1        34.17        34.17
2           2        59.02        24.85
3           3        76.63        17.61
4           2        89.40        12.77
5           1        100.40       11.00
6           3        108.62       8.22
7           4        115.45       6.82
8           1        121.34        5.89

Total cost of trip: 8.00
Total expected lifers for trip: 67.77

OPTIMUM DISTRIBUTION OF BIRDING TIME
------------------------------------
SITE                COST PER LIFER-POINT
1        12 Hours       0.039168
2        8 Hours        0.053168
3        8 Hours        0.038708
4        4 Hours        0.439597
Figure 3. Sample print-out from OrniMax. On this run, the hypothetical trip to Texas was divided into eight segments of four hours each.

A sample print-out is shown in Figure 3. The birder need supply only the date of the trip, the total number of hours availabel for birding, and a list of the birds in which he is not interested - typically, his lifelist. Two additional features of OrniMax may be mentioned briefly. It is possible to assign weights to specific birds, so that OrniMax will maximize, not the number of lifers, but rather the total point-value of the trip: thus a seabird enthusiast might want to assign greater weights to them than to other birds, or life birds may be given greater weights than year birds, or if a group of birders is traveling together birds can be weighted according to the number of birders for whom they would be lifers, Secondly, it is possible to specify a "budget" for the trip and a "cost" for each site, measured in dollars, kilometres of travel, or whatever, so that OrniMax stays within the bounds of the budget.

Comments Home


Internet Cataloguing-in-Publication Data
Mundie, David A.
    The Mathematical Theory of Birding / David A. Mundie
    Pittsburgh, PA : Polymath Systems  1995
    1. Ornithology I. Techniques II. Standards
    598.01 dc-20
                                        [MARC]

© 1995 by David A. Mundie