ts_match_names.Rd
Allows for orthographic differences between query and reference by using fuzzy matching on parsed taxonomic names. Requires taxon-tools to be installed.
Character vector or dataframe; taxonomic names to be queried.
If a character vector, missing values not allowed and all values must be unique.
If a dataframe, should be taxonomic names parsed with ts_parse_names()
.
Character vector or dataframe; taxonomic names to use as reference.
If a character vector, missing values not allowed and all values must be unique.
If a dataframe, should be taxonomic names parsed with ts_parse_names()
.
Max Levenshtein distance to allow during fuzzy matching (total insertions, deletions and substitutions). Default: 10.
Logical; If no author is given in the query and the name (without author)
occurs only once in the reference, accept the name in the reference as a match.
Default: to not allow such a match (FALSE
).
Logical; Allow a "canonical name" match if only the genus, species epithet,
and infraspecific epithet (if present) match exactly. Default: to not allow such a match (FALSE
).
Logical; if the specific epithet and infraspecific epithet are the same, drop the infraspecific rank and epithet from the query.
Character vector; taxonomic names to exclude
from collapsing with collapse_infra
. Any names used must match those in
query
exactly, or they won't be excluded.
Logical; return the output in a simplified format with only the query
name, matched reference name, and match type. Default: FALSE
.
Logical; if TRUE, docker will be used to run taxon-tools (so that taxon-tools need not be installed).
Logical vector of length 1; should a tibble be returned?
If FALSE
(default), output will be a data.frame. This argument can
be controlled via the option ts_tbl_out
; see Examples.
Dataframe with the following columns (if simple
is FALSE
):
query: Query name
reference: Matched reference name
match_type: Type of match (for a summary of match types, see taxon-tools manual)
id_query: Unique ID of query
id_ref: Unique ID of reference
genus_hybrid_sign_query: Genus hybrid sign in query
genus_name_query: Genus name of query
species_hybrid_sign_query: Species hybrid sign in query
specific_epithet_query: Specific epithet of query
infraspecific_rank_query: Infraspecific rank of query
infraspecific_epithet_query: Infraspecific epithet of query
author_query: Taxonomic author of query
genus_hybrid_sign_ref: Genus hybrid sign in reference
genus_name_ref: Genus name of reference
species_hybrid_sign_ref: Species hybrid sign in reference
specific_epithet_ref: Specific epithet of reference
infraspecific_rank_ref: Infraspecific rank of reference
infraspecific_epithet_ref: Infraspecific epithet of reference
author_ref: Taxonomic author of reference
If simple
is TRUE
, only return the first three columns above.
taxon-tools
matches names in two steps:
Scientific names are parsed into their component parts (genus, species, variety, author, etc).
Names are fuzzily matched following taxonomic rules using the component parts.
For more information on rules used for matching, see taxon-tools manual.
Parsing is fairly fast (much faster than matching) but can take some time if
the number of names is very large. If multiple queries will be made (e.g., to
the same large reference database), it is recommended to first parse the
names using ts_parse_names()
, and use the results as input to
query
and/or reference
.
collapse_infra
is useful in situations where the reference database does
not use names that have the same specific epithet and infraspecific epithet.
For example, reference name "Blechnum lunare" and query "Blechnum lunare var.
lunare". In this case, if collapse_infra
is TRUE
, "Blechnum lunare" will
be queried instead of "Blechnum lunare var. lunare". Note that the
match_type
will be "exact" even though the literal query and the matched
name are different (see example below).
if(ts_tt_installed()) {
ts_match_names(
"Crepidomanes minutus",
c("Crepidomanes minutum", "Hymenophyllum polyanthos"),
simple = TRUE
)
# If you always want tibble output without specifying `tbl_out = TRUE`
# every time, set the option:
options(ts_tbl_out = TRUE)
ts_match_names(
"Crepidomanes minutus",
c("Crepidomanes minutum", "Hymenophyllum polyanthos")
)
# Example using collapse_infra argument
ts_match_names(
c("Crepidomanes minutus", "Blechnum lunare var. lunare",
"Blechnum lunare", "Bar foo var. foo", "Bar foo"),
c("Crepidomanes minutum", "Hymenophyllum polyanthos", "Blechnum lunare",
"Bar foo"),
collapse_infra = TRUE,
collapse_infra_exclude = "Bar foo var. foo",
simple = TRUE
)
}
#> # A tibble: 5 × 3
#> query reference match_type
#> <chr> <chr> <chr>
#> 1 Crepidomanes minutus Crepidomanes minutum auto_fuzzy
#> 2 Blechnum lunare var. lunare Blechnum lunare exact
#> 3 Blechnum lunare Blechnum lunare exact
#> 4 Bar foo var. foo Bar foo auto_fuzzy
#> 5 Bar foo Bar foo exact