Allows for orthographic differences between query and reference by using fuzzy matching on parsed taxonomic names. Requires taxon-tools to be installed.

ts_match_names(
  query,
  reference,
  max_dist = 10,
  match_no_auth = FALSE,
  match_canon = FALSE,
  collapse_infra = FALSE,
  collapse_infra_exclude = NULL,
  simple = FALSE,
  docker = getOption("ts_docker", default = FALSE),
  tbl_out = getOption("ts_tbl_out", default = FALSE)
)

Arguments

query

Character vector or dataframe; taxonomic names to be queried. If a character vector, missing values not allowed and all values must be unique. If a dataframe, should be taxonomic names parsed with ts_parse_names().

reference

Character vector or dataframe; taxonomic names to use as reference. If a character vector, missing values not allowed and all values must be unique. If a dataframe, should be taxonomic names parsed with ts_parse_names().

max_dist

Max Levenshtein distance to allow during fuzzy matching (total insertions, deletions and substitutions). Default: 10.

match_no_auth

Logical; If no author is given in the query and the name (without author) occurs only once in the reference, accept the name in the reference as a match. Default: to not allow such a match (FALSE).

match_canon

Logical; Allow a "canonical name" match if only the genus, species epithet, and infraspecific epithet (if present) match exactly. Default: to not allow such a match (FALSE).

collapse_infra

Logical; if the specific epithet and infraspecific epithet are the same, drop the infraspecific rank and epithet from the query.

collapse_infra_exclude

Character vector; taxonomic names to exclude from collapsing with collapse_infra. Any names used must match those in query exactly, or they won't be excluded.

simple

Logical; return the output in a simplified format with only the query name, matched reference name, and match type. Default: FALSE.

docker

Logical; if TRUE, docker will be used to run taxon-tools (so that taxon-tools need not be installed).

tbl_out

Logical vector of length 1; should a tibble be returned? If FALSE (default), output will be a data.frame. This argument can be controlled via the option ts_tbl_out; see Examples.

Value

Dataframe with the following columns (if simple is FALSE):

  • query: Query name

  • reference: Matched reference name

  • match_type: Type of match (for a summary of match types, see taxon-tools manual)

  • id_query: Unique ID of query

  • id_ref: Unique ID of reference

  • genus_hybrid_sign_query: Genus hybrid sign in query

  • genus_name_query: Genus name of query

  • species_hybrid_sign_query: Species hybrid sign in query

  • specific_epithet_query: Specific epithet of query

  • infraspecific_rank_query: Infraspecific rank of query

  • infraspecific_epithet_query: Infraspecific epithet of query

  • author_query: Taxonomic author of query

  • genus_hybrid_sign_ref: Genus hybrid sign in reference

  • genus_name_ref: Genus name of reference

  • species_hybrid_sign_ref: Species hybrid sign in reference

  • specific_epithet_ref: Specific epithet of reference

  • infraspecific_rank_ref: Infraspecific rank of reference

  • infraspecific_epithet_ref: Infraspecific epithet of reference

  • author_ref: Taxonomic author of reference

If simple is TRUE, only return the first three columns above.

Details

taxon-tools matches names in two steps:

  1. Scientific names are parsed into their component parts (genus, species, variety, author, etc).

  2. Names are fuzzily matched following taxonomic rules using the component parts.

For more information on rules used for matching, see taxon-tools manual.

Parsing is fairly fast (much faster than matching) but can take some time if the number of names is very large. If multiple queries will be made (e.g., to the same large reference database), it is recommended to first parse the names using ts_parse_names(), and use the results as input to query and/or reference.

collapse_infra is useful in situations where the reference database does not use names that have the same specific epithet and infraspecific epithet. For example, reference name "Blechnum lunare" and query "Blechnum lunare var. lunare". In this case, if collapse_infra is TRUE, "Blechnum lunare" will be queried instead of "Blechnum lunare var. lunare". Note that the match_type will be "exact" even though the literal query and the matched name are different (see example below).

Examples

if(ts_tt_installed()) {
  ts_match_names(
    "Crepidomanes minutus",
    c("Crepidomanes minutum", "Hymenophyllum polyanthos"),
    simple = TRUE
    )

  # If you always want tibble output without specifying `tbl_out = TRUE`
  # every time, set the option:
  options(ts_tbl_out = TRUE)
  ts_match_names(
    "Crepidomanes minutus",
    c("Crepidomanes minutum", "Hymenophyllum polyanthos")
    )

  # Example using collapse_infra argument
  ts_match_names(
    c("Crepidomanes minutus", "Blechnum lunare var. lunare",
      "Blechnum lunare", "Bar foo var. foo", "Bar foo"),
    c("Crepidomanes minutum", "Hymenophyllum polyanthos", "Blechnum lunare",
      "Bar foo"),
    collapse_infra = TRUE,
    collapse_infra_exclude = "Bar foo var. foo",
    simple = TRUE
    )
}
#> # A tibble: 5 × 3
#>   query                       reference            match_type
#>   <chr>                       <chr>                <chr>     
#> 1 Crepidomanes minutus        Crepidomanes minutum auto_fuzzy
#> 2 Blechnum lunare var. lunare Blechnum lunare      exact     
#> 3 Blechnum lunare             Blechnum lunare      exact     
#> 4 Bar foo var. foo            Bar foo              auto_fuzzy
#> 5 Bar foo                     Bar foo              exact