@joelnitta@fosstodon.org
Associate Professor @ Chiba University
Research interests: Ecology and evolution of ferns
Photo: J-Y Meyer
… which are linked by taxonomic names (OTUs)
etc…
Can take many forms:
We will focus on point data (the data available from GBIF) during the coding session
GBIF is not one database; it is a portal to many databases
You should try the web interface first to familiarize yourself with it
We can’t use the occurrence records in GBIF as-is. They may include many errors (typos, etc.) and need to be checked carefully. More on this during the coding session.
The settings for data cleaning depend on your analysis
Defaults are a good start, but make sure they make sense!
For example, if your grid-cell size is 1 degree x 1 degree, you don’t need data to be more exact than that
Geographic Coordinate System (GCS)
Projected Coordinate System (PCS)
Latitude and longitude alone are not enough
The earth is not a perfect sphere
GCS defines how to model the earth (e.g., WGS84)
Hiker’s coordinates at 134.577°E, 24.006°S. But where is she (A or B)?
https://www.esri.com/arcgis-blog/products/arcgis-pro/mapping/gcs_vs_pcs/
The earth is round, but we project it onto flat maps
The decision of how to do this is not trivial
There will always be some amount of distortion in area, distance, or direction
https://geoawesomeness.com/wp-content/uploads/2022/03/projections.jpg
You need to choose an appropriate CRS for your study (there are thousands)
If you assume that your sampling units have equal area, make sure to use an equal-area projection (e.g., Mollweide)
The Mollweide projection. The orange dots have the same area, but their shape is distorted as you move away from the equator.
https://en.wikipedia.org/wiki/Mollweide_projection
Raw data are often provided on a per-species basis
But we are interested in assemblages (grid-cells) of species → need to group species together
https://phys.org/news/2018-12-local-conditions.html
For point data, a typical method is to divide the study area into equal size grid-cells, then count the species occurring in each grid-cell
For shape data, you would overlay the shapes
For checklist data, the areas may not be equal sized. You could simply use the sampling units in the checklist (e.g., counties, countries, etc.)
https://images.unsplash.com/
rentrez
package
restez
package for larger datasets)https://a-little-book-of-r-for-bioinformatics.readthedocs.io/
We don’t have time to cover this today - that is a whole topic of study unto itself!
https://mediacdn.nhbs.com/jackets/jackets_resizer_xlarge/17/170234.jpg
ftolr
for ferns)rotl
R package) (caution!)https://docs.ropensci.org/rotl/logo.svg
https://www.irasutoya.com/
In other words, to place species on the tree based on their taxonomy
We need to resolve names to a standard taxonomic database
Live coding session demonstrating how to use rgbif
, CoordinateCleaner
, and phyloregion
to obtain data
Code is available here: https://github.com/joelnitta/spatial-phy-workshop/blob/main/tutorials/occ_phy.md
A typical workflow involves the following steps:
rgbif
)CoordinateCleaner
)rgbif
)phyloregion
)canaper
)We don’t have time for demonstrating how to download sequences, assemble them, and conduct phylogenetic analysis (phylogenetic pipelines).
We will be using a pre-built tree
The distribution of endemicity
Environmental drivers of biodiversity
Structure of biodiversity
Nitta et al. AJB 2022 https://doi.org/10.1002/ajb2.1848
Photos A. Ebihara
Variation in climate from N (subarctic) to S (subtropical)
Variation in elevation
Main islands continental, southern islands oceanic
Nitta et al. AJB 2022 https://doi.org/10.1002/ajb2.1848
Phylogenetic diversity is predicted by % of apogamous (asexual) species
When testing spatial hypotheses (e.g., richness is determined by temperature), we must use spatial methods
Because of spatial autocorrelation
Compare amount of observed autocorrelation to some expected value: Moran’s I
Workflow:
Understanding the distribution of bio-regions
High rates of endemism on remote islands cause difference in taxonomic and phylogenetic bioregions
canaper
canaper
R packageLive coding session demonstrating how to use canaper
to conduct spatial phylogentic analysis
Code is available here: https://github.com/joelnitta/spatial-phy-workshop/blob/main/tutorials/canaper.md
A typical workflow involves the following steps:
Please fill out the post-workshop survey: