query | matched_name | resolved_name |
---|---|---|
Anemia collina Sm. | Anemia collina Sm. | Anemia collina Raddi |
Pteris flava Merr. | Pteris flava Merr. | Pteris linearis Poir. |
Joel Nitta1, Eric Schuettpelz2, Santiago Ramírez-Barahona3,
Wataru Iwasaki1
1: The University of Tokyo, 2: Smithsonian Institution, 3: Universidad Nacional Autónoma de México
Botany 2022
https://joelnitta.github.io/botany_2022_ftol
Only with a phylogeny can we begin to understand diversification, regularities in patterns of evolution, or simply suggest individual evolutionary changes within a clade
- APG
https://www.digitalatlasofancientlife.org/learn/embryophytes/angiosperms/angiosperm-phylogeny/
Antonelli et al. (2016)
Any automated pipeline must make shortcuts and assumptions
Manual inspection of all sequences would lead to high-quality results, but does not scale
Goal: construct a pipeline to generate a maximally sampled, high taxonomic quality phylogeny of ferns
A large, diverse, ecologically important group of plants
Much more tractable than seed plants (angiosperms):
Ferns: ca. 12,000 species, 40-50% sequenced
Seed plants: ca. 350,000 species, 20% sequenced
Download data to local database using restez* R package
Use superCRUNCH (Portik and Wiens 2020) to extract sequences without relying on annotations
Use World Ferns (Hassler 2022) as basis for new, fern-specific taxonomic database, pteridocat
Resolve GenBank species names to pteridocat using taxastand* R package
query | matched_name | resolved_name |
---|---|---|
Anemia collina Sm. | Anemia collina Sm. | Anemia collina Raddi |
Pteris flava Merr. | Pteris flava Merr. | Pteris linearis Poir. |
… (6,475 total)
Run all-by-all BLAST (Camacho et al. 2009)
Any query matching the wrong family is excluded as mis-ID
species | accession | locus | query family | match family |
---|---|---|---|---|
Abacopteris_gymnopteridifrons | JF303974 | rbcL | Thelypteridaceae | Athyriaceae |
Angiopteris_evecta | AY344778 | trnL-trnF | Marattiaceae | Ophioglossaceae |
… (70 total)
Align plastome sequences with MAFFT (Katoh et al. 2002) (544 species x 74,883 bp, 12.1% missing)
Infer tree using ML in IQ-TREE (Nguyen et al. 2015) (concatenated matrix, no paritioning)
Align Sanger sequences with MAFFT (5,582 species x 12,716 bp, 77% missing)
Infer tree in IQ-TREE (concatenated matrix, no paritioning) with plastome tree as constraint
51 fossils (2x more than previous)
Pushes back stem ages for most families ca. 10-30 my
Suggests ferns did not diversify “in the shadow” of angiosperms
Data downloads
Shiny app for exploring data
https://github.com/fernphy/ftolr
Read tree and data (alignments) directly into R
Options for outgroups, rooting, locus selection, etc.
Phylogenetic tree with 5582 tips and 5581 internal nodes.
Tip labels:
Acrostichum_danaeifolium, Acrostichum_speciosum, Acrostichum_aureum, Ceratopteris_richardii, Ceratopteris_cornuta, Ceratopteris_shingii, ...
Node labels:
100/100, 100/100, 100, 100/100, 100, 100/100, ...
Rooted; includes branch lengths.
Consulted with a taxonomic expert on family Thelypteridaceae (S. Fawcett) between v1.0.0 and v1.1.0
Automated, versioned mining of GenBank data
Custom taxonomy tailored for ferns
Input from taxonomic experts and broader community
Model for other plant groups at similar scale?
Completion of FTOL
Integration with Pteridophyte Phylogeny Group II
Transition to phylogenomics for all species
Japan Society for the Promotion of Science
Smithsonian National Museum of Natural History Peter Buck Fellowship
Members of the Iwasaki lab, The University of Tokyo
A.E. White
S. Fawcett
M. Hassler