In order to safeguard our environment, we must first have tools that give us information about its health. Biodiversity monitoring is crucial for providing this information. One form of this is bird species richness surveys, which have traditionally been conducted by having an experienced bird-watcher travel to a location to make observations. This is a time-consuming and expensive task.
An emerging alternative approach is to record audio from an ecologically significant location and then listen to it at a later stage for species identification. Unlike on-site surveys, using audio recordings allows the capture of long periods of audio that can then be listened to non-sequentially. Samples can be chosen from throughout the long recording and presented to an ecologist in any order. By selecting this set of samples intelligently, the listener will be able to observe more species in less overall time, increasing the efficiency of their labour.
This thesis investigates methods to make this selection of samples from a long recording in order to maximise the number of bird species contained in a given number of selected samples. The task of species identification remains with the human listener, but the process of selecting the rich subset of the recording is performed by a machine.
We first lay out the framework whereby the problem is decomposed into two parts: how to estimate the internal variety of call types within each candidate sample, and how to estimate the inter-sample dissimilarity. The former allows us to find samples with a high overall number of species and the latter to avoid selecting samples whose species overlap heavily with previously selected samples.
The primary way that these objectives are achieved, and the main contribution of this thesis, is through clustering short segments of audio and using the clusters contained within each sample to estimate the internal variety of call types as well as comparing clusters between samples to estimate their dissimilarity. Two methods of segmentation are used, and, for each one, an appropriate set of preprocessing steps and feature representation was developed. Arising from this pipeline of computational steps is also a novel visualisation tool that can allow a user to rapidly scan long-duration recordings by filtering out parts with low activity and repetition of the same call types.
An algorithm was developed that scores samples based on a combination of their cluster content with other estimates of the internal variety and intra-sample dissimilarity. These other estimates are based on prior knowledge about the daily patterns of vocal activity and durations that species call for at a given time, as well the overall acoustic activity of each sample. The best combination of methods outperforms existing methods, such as randomly sampling from the dawn chorus.
Together, these contributions help to advance the field of ecoacoustics for bird species monitoring and offer a way forward towards developing valuable tools for the exploration of ecological audio recordings.