Resolving species names rapidly and accurately with taxastand

Joel Nitta, Wataru Iwasaki
The University of Tokyo

Botany 2022
https://joelnitta.github.io/botany_2022_taxastand

Species names are the “glue” that connect datasets

Page (2013)

Synonyms break linkages

In the age of big data, software is needed to resolve taxonomy

Shortcomings of current approaches

  • Many tools only available via an online interface (API)
    • Difficult to reproduce
  • Limited number of reference databases to choose from
    • May not be able to implement taxonomy of choice
  • Existing tools do not recognize the rules of taxonomic nomenclature
    • May not be able to accurately match names

Features of taxastand

  • Run locally in R
  • Allows usage of a custom reference database
  • Supports fuzzy matching
  • Understands taxonomic rules

Available at https://github.com/joelnitta/taxastand

Usage

Installation

In R:

# install remotes first
install.packages("remotes")
remotes::install_github("joelnitta/taxastand")
library(taxastand)

Also, need to either install taxon-tools or Docker

Basic matching: fuzzy matching

res <- ts_match_names(
    query = "Crepidomanes minutus",
    reference = c(
      "Crepidomanes minutum",
      "Hymenophyllum polyanthos"),
    simple = TRUE,
    docker = TRUE
    )
glimpse(res)
Rows: 1
Columns: 3
$ query      <chr> "Crepidomanes minutus"
$ reference  <chr> "Crepidomanes minutum"
$ match_type <chr> "auto_fuzzy"

Basic matching: taxonomic rules

res <- ts_match_names(
    query = "Crepidomanes minutum K. Iwats.",
    reference = c(
      "Crepidomanes minutum (Bl.) K. Iwats.",
      "Hymenophyllum polyanthos (Sw.) Sw."),
    simple = TRUE,
    docker = TRUE
    )
glimpse(res)
Rows: 1
Columns: 3
$ query      <chr> "Crepidomanes minutum K. Iwats."
$ reference  <chr> "Crepidomanes minutum (Bl.) K. Iwats."
$ match_type <chr> "auto_basio-"

For name resolution, need a reference database

data(filmy_taxonomy)
head(filmy_taxonomy[c("taxonID", "acceptedNameUsageID",
  "taxonomicStatus", "scientificName")])
   taxonID acceptedNameUsageID taxonomicStatus
1 54115096                  NA   accepted name
2 54133783            54115097         synonym
3 54115097                  NA   accepted name
4 54133784            54115098         synonym
5 54115098                  NA   accepted name
6 54133785            54115099         synonym
                             scientificName
1             Cephalomanes atrovirens Presl
2                Trichomanes crassum Copel.
3 Cephalomanes crassum (Copel.) M. G. Price
4           Trichomanes densinervium Copel.
5 Cephalomanes densinervium (Copel.) Copel.
6         Trichomanes infundibulare Alderw.

Where to get taxonomic data?

Name resolution

res <- ts_resolve_names(
  query = "Gonocormus minutum",
  ref_taxonomy = filmy_taxonomy,
  docker = TRUE)
glimpse(res)
Rows: 1
Columns: 6
$ query           <chr> "Gonocormus minutum"
$ resolved_name   <chr> "Crepidomanes minutum (Bl.) K. Iwats."
$ matched_name    <chr> "Gonocormus minutus (Bl.) Bosch"
$ resolved_status <chr> "accepted name"
$ matched_status  <chr> "synonym"
$ match_type      <chr> "auto_fuzzy"

Example: ferns of Japan

https://github.com/joelnitta/ja_ferns_names

How can we make a map of endangered fern species in Japan?

GreenList and GBIF do not use the same taxonomy

Solution: match names of both to pteridocat

  1. Match GBIF to pteridocat
  2. Match GreenList to pteridocat
  3. Merge GreenList and GBIF
  4. Compare to Ebihara and Nitta (2019) (non-GBIF data)

Results

Most names successfully resolved

Of 1,092 species in GBIF data,
765 were merged with names in Green List
(GL has 727 native, non-hybrid taxa)

Match type n
Exact 438
Difference in punctuation 192
Missing author 36
Taxonomic rule 22
Fuzzy 13

Map is close to ground truth

Summary

taxastand allows for reliable, customizable taxonomic resolution

  • Main feature: use of custom taxonomy
    • Advantage: can be adapted to different projects
    • Disadvantage: not simple to prepare/maintain reference db

Please choose the tool that works best for you!
(see Grenié et al. 2022)

Acknowledgements

  • Japan Society for the Promotion of Science

  • Members of the Iwasaki lab, The University of Tokyo

  • C. Webb

  • M. Hassler

References

Ebihara, A., and J. H. Nitta. 2019. An update and reassessment of fern and lycophyte diversity data in the Japanese Archipelago. Journal of Plant Research 132:723–738.
Grenié, M., E. Berti, J. Carvajal‐Quintero, G. M. L. Dädlow, A. Sagouis, and M. Winter. 2022. Harmonizing taxon names in biodiversity data: A review of tools, databases and best practices. Methods in Ecology and Evolution:2041–210X.13802.
Page, R. D. M. 2013. BioNames: linking taxonomy, texts, and trees. PeerJ 1:e190.