taxastand and dwctaxon

A pair of R packages for standardizing species names in Darwin Core format

Joel Nitta, Wataru Iwasaki
The University of Tokyo

BioDigiCon 2022

Species names are the “glue” that connect datasets

Page (2013)

Synonyms break linkages

In the age of big data, software is needed to resolve taxonomy

Shortcomings of current approaches

  • Many tools only available via an online interface (API)
    • Difficult to reproduce
  • Limited number of reference databases to choose from
    • May not be able to implement taxonomy of choice
  • Existing tools do not recognize the rules of taxonomic nomenclature
    • May not be able to accurately match names

Features of taxastand

  • Run locally in R
  • Allows usage of a custom reference database
  • Supports fuzzy matching
  • Understands taxonomic rules

Available at https://github.com/joelnitta/taxastand

Usage

Installation

In R:

# install remotes first
install.packages("remotes")
remotes::install_github("joelnitta/taxastand")
library(taxastand)

Also, need to either install taxon-tools or Docker

Name matching

res <- ts_match_names(
    query = "Crepidomanes minutus",
    reference = c(
      "Crepidomanes minutum",
      "Hymenophyllum polyanthos"),
    simple = TRUE,
    docker = TRUE
    )
glimpse(res)
Rows: 1
Columns: 3
$ query      <chr> "Crepidomanes minutus"
$ reference  <chr> "Crepidomanes minutum"
$ match_type <chr> "auto_fuzzy"

Matching based on taxonomic rules

res <- ts_match_names(
    query = "Crepidomanes minutum K. Iwats.",
    reference = c(
      "Crepidomanes minutum (Bl.) K. Iwats.",
      "Hymenophyllum polyanthos (Sw.) Sw."),
    simple = TRUE,
    docker = TRUE
    )
glimpse(res)
Rows: 1
Columns: 3
$ query      <chr> "Crepidomanes minutum K. Iwats."
$ reference  <chr> "Crepidomanes minutum (Bl.) K. Iwats."
$ match_type <chr> "auto_basio-"

For name resolution, need a reference database

data(filmy_taxonomy)
head(filmy_taxonomy[c("taxonID", "acceptedNameUsageID",
  "taxonomicStatus", "scientificName")])
# A tibble: 6 × 4
   taxonID acceptedNameUsageID taxonomicStatus scientificName                   
     <dbl>               <dbl> <chr>           <chr>                            
1 54115096                  NA accepted name   Cephalomanes atrovirens Presl    
2 54133783            54115097 synonym         Trichomanes crassum Copel.       
3 54115097                  NA accepted name   Cephalomanes crassum (Copel.) M.…
4 54133784            54115098 synonym         Trichomanes densinervium Copel.  
5 54115098                  NA accepted name   Cephalomanes densinervium (Copel…
6 54133785            54115099 synonym         Trichomanes infundibulare Alderw.

Name resolution

res <- ts_resolve_names(
  query = "Gonocormus minutum",
  ref_taxonomy = filmy_taxonomy,
  docker = TRUE)
glimpse(res)
Rows: 1
Columns: 6
$ query           <chr> "Gonocormus minutum"
$ resolved_name   <chr> "Crepidomanes minutum (Bl.) K. Iwats."
$ matched_name    <chr> "Gonocormus minutus (Bl.) Bosch"
$ resolved_status <chr> "accepted name"
$ matched_status  <chr> "synonym"
$ match_type      <chr> "auto_fuzzy"

dwctaxon

Example: filmy ferns

filmies <- head(dct_filmies) |>
  filter(str_detect(scientificName, "crassum|densinervium"))

filmies
# A tibble: 4 × 4
  taxonID  acceptedNameUsageID taxonomicStatus scientificName                   
  <chr>    <chr>               <chr>           <chr>                            
1 54133783 54115097            synonym         Trichomanes crassum Copel.       
2 54115097 <NA>                accepted name   Cephalomanes crassum (Copel.) M.…
3 54133784 54115098            synonym         Trichomanes densinervium Copel.  
4 54115098 <NA>                accepted name   Cephalomanes densinervium (Copel…

Changing taxonomy is complicated

Old version:

  • Accepted species 1: Cephalomanes crassum
    • Synonym: Trichomanes crassum
  • Accepted species 2: Cephalomanes densinervium
    • Synonym: Trichomanes densinervium

New version (C. crassum → synonym of C. densinervium):

  • Accepted species: Cephalomanes densinervium
    • Synonym 1: Cephalomanes crassum
    • Synonym 2: Trichomanes crassum
    • Synonym 3: Trichomanes densinervium

Need to account for all synonyms

dct_change_status() handles synonym mapping

# C. crassum → synonym of C. densinervium
dct_change_status(
  tax_dat = filmies,
  sci_name = "Cephalomanes crassum (Copel.) M. G. Price",
  new_status = "synonym",
  usage_name = "Cephalomanes densinervium (Copel.) Copel."
)
# A tibble: 4 × 4
  taxonID  acceptedNameUsageID taxonomicStatus scientificName                   
  <chr>    <chr>               <chr>           <chr>                            
1 54133783 54115098            synonym         Trichomanes crassum Copel.       
2 54115097 54115098            synonym         Cephalomanes crassum (Copel.) M.…
3 54133784 54115098            synonym         Trichomanes densinervium Copel.  
4 54115098 <NA>                accepted name   Cephalomanes densinervium (Copel…

dct_validate() checks taxonomic data

dct_change_status(
  tax_dat = filmies,
  sci_name = "Trichomanes crassum Copel.",
  new_status = "synonym",
  usage_name = "Trichomanes densinervium Copel."
) |>
dct_validate()
Error: `check_mapping` failed.
`taxonID`(s) detected whose `acceptedNameUsageID` value does not map to
`taxonID` of an existing name.
Bad `taxonID`: 54133783
Bad `scientificName`: Trichomanes crassum Copel.

Putting it all together (with |>)

ferns_tax_raw |>
# Add entry for Dryopteris simasakii var. simasakii autonym
  dct_add_row(
    sci_name = "Dryopteris simasakii var. simasakii",
    taxonomicStatus = "accepted",
    taxonRank = "variety",
    parentNameUsageID = "37XPH",
  ) |>
  # Change status of Parahemionitis arifolia as indicated by plastome data
  dct_change_status(
    sci_name = "Parahemionitis arifolia (Burm. fil.) Panigrahi",
    new_status = "accepted"
  ) |>
  dct_change_status(
    sci_name = "Hemionitis arifolia (Burm. fil.) T. Moore",
    new_status = "synonym",
    usage_name = "Parahemionitis arifolia (Burm. fil.) Panigrahi"
  ) |>
  # ... (other changes)
  dct_validate()

Example: https://github.com/fernphy/pteridocat

Summary

taxastand + dwctaxon: flexible taxonomic standardization

  • taxastand: accurate, customizable taxonomic resolution

  • dwctaxon: maintenance of DWC-compliant taxonomic database

Not all researchers need this (standard databases may be fine)

Acknowledgements

  • Japan Society for the Promotion of Science

  • Members of the Iwasaki lab, The University of Tokyo

  • C. Webb

  • M. Hassler

References

Grenié, M., E. Berti, J. Carvajal‐Quintero, G. M. L. Dädlow, A. Sagouis, and M. Winter. 2022. Harmonizing taxon names in biodiversity data: A review of tools, databases and best practices. Methods in Ecology and Evolution:2041–210X.13802.
Page, R. D. M. 2013. BioNames: linking taxonomy, texts, and trees. PeerJ 1:e190.