These days, if I’m 10 km NE of Suva on the road to Nausori, Northern Division, Fiji, and want to take a picture of the tropical countryside, after snapping away I can also pull out my little GPS machine and determine my position exactly as degrees of latitude and longitude â€“ 18.075S, 178.525E, as it happens. And that’s much handier for sharing information about geographic locations, something that has become a lot easier â€“ and popular â€“ since the launch of Google Earth. What Iâ€™ve just done on the road to Nausori is called geotagging, or georeferencing. That just means adding information about locality â€“ ideally latitude and longitude coordinates â€“ to media like websites, RSS feeds and indeed images. Once your images are geotagged with coordinates and uploaded to Flickr, for example, you can display them in Google Earth â€“ or a geographic information system (GIS) â€“ to show where you took them. Pretty cool way to tell your family about your recent vacation. Soon, digital cameras will have a built-in GPS (many mobile phones already do), and the geotagging will be automatic.
Conservationists are also very keen on geotagging, but geotagging organisms is not as easy as photos. We have huge repositories of specimens of plants and animals at our disposal, both live and preserved, in things like herbaria, natural history museums, botanic gardens and genebanks. And associated with these specimens is usually a certain amount of data: things like the name of the species, the name of the collector, the date the specimen was collected and the place where it was found. These collections and their data are a very precious resource for taxonomy, ecology, conservation, agricultural development and other types of work, but they would be more valuable still if the data were available electronically to a greater extent. Many genebanks and herbaria have not yet placed the information found on the labels stuck on their seed containers and specimens sheets into a database, for example, although to be fair some have, and have even made that information available on-line.
However, even when the label information is digitized, the locality information is very rarely in a form that you can plug directly into Google Earth. Thatâ€™s because, typically, the locality information â€“ which may have been collected long before the GPS receiver became so readily accessible â€“ doesnâ€™t include latitude and longitude coordinates. Itâ€™s much more likely to just have the kind of information I started this post with: â€œ10 km NE of Suva, on road to Nausori, Northern Division, Fiji.â€ Armed with that kind of text description, a good map, and perhaps some guesswork, you can of course derive the coordinates. But imagine doing that for all the pressed plants or germplasm accessions in even a smallish herbarium or genebank. Doesnâ€™t bear thinking about. And thereâ€™s no guarantee that someone else presented with the same locality description in another herbarium or genebank would get the same answer.
Enter the Biogeomancer project. A global consortium of natural history scientists and experts in geospatial data, its goal is to â€œmaximize the quality and quantity of biodiversity data that can be mapped in support of scientific research, planning, conservation, and management.â€ One of the main ways it does this is by developing tools to automate the geotagging process.
These tools first break down â€“ parse â€“ the textual locality description into its components, and look up the key locality name (Suva, in our example) in electronic gazetteers, which are lists of locality names with their coordinates. They then apply the offset implied by the phrase â€œ10 km NEâ€ to the localityâ€™s coordinates, according to specified methods and standards, even providing an estimate of accuracy. Finally, they validate the results, for example by checking that the final coordinates are on land (assuming the specimen is a terrestrial organism!), between Suva and Nausori, and in the Northern Division of the country called Fiji.
Automated geotagging should cut down the time necessary to process a specimen from 5-10 minutes to fractions of a second, while adding to the repeatability and accuracy of the process. That means that data exchange will be easier, and that it will be possible to combine data coming from different institutions in a single analysis with more confidence that the quality of the data from different sources will be comparable.
Biogeomancer expects to have a â€œworkbenchâ€ available to automate georeferencing by the end of 2006. Iâ€™m sure many botanists and zoologists will jump on it, but genebanks will probably be a bit behind. They donâ€™t seem as wired into the latest bioinformatics developments as museums and herbaria. Maybe this post will help a bit.