The goal of
taxastand is to standardize species names from different sources, a common task in biology.
Very often different biologists use different synonyms to refer to the same species. If we want to join data from different sources, their taxonomic names must be standardized first. This is what
taxastand seeks to do in a reproducible and efficient manner.
This package is in early development. There may be major, breaking changes to functionality in the near future. If you use this package, I highly recommend using a package manager like renv so that later updates won’t break your code.
taxastand is based on matching names to a single taxonomic standard, that is, a database of accepted names and synonyms. As long as a single taxonomic standard is used, we can confidently resolve names from disparate sources.
The taxonomic standard must conform to Darwin Core standards. The user must provide this database (as a dataframe). There are many sources of taxonomic data online, including GBIF, Catalog of Life, and ITIS to name a few. The taxadb package provides convenient functions for downloading various taxonomic databases that use Darwin Core.
taxastand is currently only available on GitHub:
# install.packages("remotes") remotes::install_github("joelnitta/taxastand")
taxastand depends on taxon-tools for taxonomic name matching. The two programs included in
matchnames, must be installed and on the user’s
A docker image is available to run
taxize is the “granddaddy” of taxonomy packages in R. It can search around 20 different taxonomic databases for names and retrieve taxonomic information.
TNRS, the Taxonomic Name Resolution Service, is a web application that resolves taxonomic names of plants according to one of six databases.
taxizedb downloads taxonomic databases and provides tools to interface with them through SQL.
taxadb also downloads and searches taxonomic databases. It can interface with them either through SQL or in-memory in R.
taxonstand has a very similar goal to
taxastand, but only uses The Plant List (TPL) as its taxonomic standard and does not allow the user to provide their own. Note that TPL is no longer being updated as of 2013.
Although existing web-based solutions for taxonomic name resolution are very useful, they may not be ideal for all situations: the choice of reference database to use for standardization is limited, they may not be able to handle very large queries, and the user has no guarantee that the same input will yield the same output at a later date due to changes in the remote database.
Furthermore, matching of taxonomic names is not straightforward, since they are complex data structures including multiple components (e.g., genus, specific epithet, basionym author, combination author, etc). Of the tools mentioned above only TNRS can fuzzily match taxonomic names based on their parsed components, but it does not allow for use of a local reference database.
The motivation for
taxastand is to provide greater flexibility and reproducibility by allowing for complete version control of the code and database used for name resolution, while implementing fuzzy matching of parsed taxonomic names.
Here is an example of fuzzy matching followed by resolution of synonyms using the dataset included with the package.
library(taxastand) # Load example reference taxonomy in Darwin Core format data(filmy_taxonomy) # Take a look at the columns used by taxastand head(filmy_taxonomy[c("taxonID", "acceptedNameUsageID", "taxonomicStatus", "scientificName")]) #> taxonID acceptedNameUsageID taxonomicStatus #> 1 54115096 NA accepted name #> 2 54133783 54115097 synonym #> 3 54115097 NA accepted name #> 4 54133784 54115098 synonym #> 5 54115098 NA accepted name #> 6 54133785 54115099 synonym #> scientificName #> 1 Cephalomanes atrovirens Presl #> 2 Trichomanes crassum Copel. #> 3 Cephalomanes crassum (Copel.) M. G. Price #> 4 Trichomanes densinervium Copel. #> 5 Cephalomanes densinervium (Copel.) Copel. #> 6 Trichomanes infundibulare Alderw. # As a test, resolve a misspelled name ts_resolve_names("Gonocormus minutum", filmy_taxonomy) #> query resolved_name #> 1 Gonocormus minutum Crepidomanes minutum (Bl.) K. Iwats. #> matched_name resolved_status matched_status match_type #> 1 Gonocormus minutus (Bl.) Bosch accepted name synonym auto_fuzzy # We can now use the `resolved_name` column of this result for downstream # analyses joining on other datasets that have been resolved to the same # reference taxonomy.
If you use this package, please cite it! Here is an example:
The example DOI above is for the overall package.
Here is the latest DOI, which you should use if you are using the latest version of the package:
You can find DOIs for older versions by viewing the “Releases” menu on the right.
You should also cite the software that
taxastand relies on,