The goal of canaper is to enable categorical analysis of neo- and paleo-endemism (CANAPE) in R. This is the first implementation in R of CANAPE, which was previously only available in Biodiverse.

Important note

This package is in early development. There may be major, breaking changes to functionality in the near future. If you use this package, I highly recommend using a package manager like renv so that later updates won’t break your code.

Installation

You can install canaper from GitHub with:

# install.packages("remotes")
remotes::install_github("joelnitta/canaper")

Example usage

These examples use the dataset from Phylocom. The dataset includes a community (site x species) matrix and a phylogenetic tree.

library(canaper)

data(phylocom)

# Example community matrix including 4 "clumped" communities,
# one "even" community, and one "random" community
phylocom$comm
#>         sp1 sp10 sp11 sp12 sp13 sp14 sp15 sp17 sp18 sp19 sp2 sp20 sp21 sp22
#> clump1    1    0    0    0    0    0    0    0    0    0   1    0    0    0
#> clump2a   1    2    2    2    0    0    0    0    0    0   1    0    0    0
#> clump2b   1    0    0    0    0    0    0    2    2    2   1    2    0    0
#> clump4    1    1    0    0    0    0    0    2    2    0   1    0    0    0
#> even      1    0    0    0    1    0    0    1    0    0   0    0    1    0
#> random    0    0    0    1    0    4    2    3    0    0   1    0    0    1
#>         sp24 sp25 sp26 sp29 sp3 sp4 sp5 sp6 sp7 sp8 sp9
#> clump1     0    0    0    0   1   1   1   1   1   1   0
#> clump2a    0    0    0    0   1   1   0   0   0   0   2
#> clump2b    0    0    0    0   1   1   0   0   0   0   0
#> clump4     0    2    2    0   0   0   0   0   0   0   1
#> even       0    1    0    1   0   0   1   0   0   0   1
#> random     2    0    0    0   0   0   2   0   0   0   0

# Example phylogeny
phylocom$phy
#> 
#> Phylogenetic tree with 32 tips and 31 internal nodes.
#> 
#> Tip labels:
#>   sp1, sp2, sp3, sp4, sp5, sp6, ...
#> Node labels:
#>   A, B, C, D, E, F, ...
#> 
#> Rooted; includes branch lengths.

The main “workhorse” function of canaper is cpr_rand_test(), which conducts a randomization test to determine if observed values of phylogenetic diversity (PD) and phylogenetic endemism (PE) are significantly different from random. It also calculates the same values on an alternative phylogeny where all branch lengths have been set equal (alternative PD, alternative PE) as well as the ratio of the original value to the alternative value (relative PD, relative PE).

set.seed(071421)
rand_test_results <- cpr_rand_test(phylocom$comm, phylocom$phy, null_model = "swap")
#> Warning in match_phylo_comm(phy = phy, comm = comm): Dropping tips from the tree because they are not present in the community data: 
#>  sp16, sp23, sp27, sp28, sp30, sp31, sp32

cpr_rand_test produces a lot of columns (nine per metric), so let’s just look at a subset of them:

rand_test_results[, 1:9]
#>            pd_obs pd_rand_mean pd_rand_sd  pd_obs_z pd_obs_c_upper
#> clump1  0.3018868    0.4675472 0.03623666 -4.571624              0
#> clump2a 0.3207547    0.4684906 0.03116570 -4.740335              0
#> clump2b 0.3396226    0.4684906 0.03150994 -4.089754              0
#> clump4  0.4150943    0.4664151 0.03307178 -1.551799              3
#> even    0.5660377    0.4641509 0.03517108  2.896891            100
#> random  0.5094340    0.4713208 0.03295196  1.156629             80
#>         pd_obs_c_lower pd_obs_q pd_obs_p_upper pd_obs_p_lower
#> clump1             100      100           0.00           1.00
#> clump2a            100      100           0.00           1.00
#> clump2b            100      100           0.00           1.00
#> clump4              90      100           0.03           0.90
#> even                 0      100           1.00           0.00
#> random               7      100           0.80           0.07

This is a summary of the columns:

  • *_obs: Observed value
  • *_obs_c_lower: Count of times observed value was lower than random values
  • *_obs_c_upper: Count of times observed value was higher than random values
  • *_obs_p_lower: Percentage of times observed value was lower than random values
  • *_obs_p_upper: Percentage of times observed value was higher than random values
  • *_obs_q: Count of the non-NA random values used for comparison
  • *_obs_z: Standard effect size (z-score)
  • *_rand_mean: Mean of the random values
  • *_rand_sd: Standard deviation of the random values

The next step in CANAPE is to classify endemism types according to the significance of PE, alternative PE, and relative PE. This adds a column called endem_type.

canape_results <- cpr_classify_endem(rand_test_results)

canape_results[, "endem_type", drop = FALSE]
#>              endem_type
#> clump1  not significant
#> clump2a not significant
#> clump2b not significant
#> clump4  not significant
#> even              super
#> random            mixed

This data set is very small, so it doesn’t include all possible endemism types. In total, they include:

  • paleo: paleoendemic
  • neo: neoendemic
  • not significant (what it says)
  • mixed: mixture of both paleo and neo
  • super: mixed and highly significant (p < 0.01)

For a more complete example, please see the vignette

Other information

Poster at Botany 2021

Citing this package

If you use this package, please cite it! Here is an example:

Nitta JH, Laffan SW, Mishler BD, Iwasaki W. (2021) canaper: Categorical analysis of neo- and paleo-endemism in R. doi: 10.5281/zenodo.5094032

The example DOI above is for the overall package.

Here is the latest DOI, which you should use if you are using the latest version of the package:

DOI

You can find DOIs for older versions by viewing the “Releases” menu on the right.

Papers citing canaper

Licenses

References

Mishler, B., Knerr, N., González-Orozco, C. et al. Phylogenetic measures of biodiversity and neo- and paleo-endemism in Australian Acacia. Nat Commun 5, 4473 (2014). https://doi.org/10.1038/ncomms5473