Given a folder containing phylogenetic trees, split the trees into multiple subtrees for nodes that bifurcate deeper than internal_branch_length_cutoff. tree_folder and outdir should be different to avoid writing over input trees. This function will overwrite any output files with the same name in outdir.

cut_long_internal_branches(path_to_ys = pkgconfig::get_config("baitfindR::path_to_ys"),
  tree_folder, tree_file_ending, internal_branch_length_cutoff,
  minimal_taxa = 4, outdir, overwrite = FALSE, get_hash = TRUE,
  echo = pkgconfig::get_config("baitfindR::echo", fallback = FALSE), ...)

Arguments

path_to_ys

Character vector of length one; the path to the folder containing Y&S python scripts, e.g., "/Users/me/apps/phylogenomic_dataset_construction/"

tree_folder

Character vector of length one; the path to the folder containing the trees to cut.

tree_file_ending

Character vector of length one; only tree files with this file ending will be used.

internal_branch_length_cutoff

Numeric vector of length one; the depth at which cuts should be made (smaller numbers indicate greater depth).

minimal_taxa

Numeric; minimal number of taxa required for tree to be cut. Default 4, the minimum number of taxa needed for an un-rooted tree.

outdir

Character vector of length one; the path to the folder where the subtrees should be written.

overwrite

Logical; should previous output of this command be erased so new output can be written? Once erased it cannot be restored, so use with caution!

get_hash

Logical; should the 32-byte MD5 hash be computed for all output subtree files concatenated together? Used for by drake_plan for tracking during workflows. If TRUE, this function will return the hash.

echo

Logical; should the standard output and error be printed to the screen?

...

Other arguments. Not used by this function, but meant to be used by drake_plan for tracking during workflows.

Value

For each input tree with a file ending in tree_file_ending in tree_folder, one or more subtrees with a file ending in .subtree will be written to tree_folder. If get_hash is TRUE, the 32-byte MD5 hash be computed for all subtree files concatenated together will be returned.

Details

Wrapper for Yang and Smith (2014) cut_long_internal_branches.py

References

Yang, Y. and S.A. Smith. 2014. Orthology inference in non-model organisms using transcriptomes and low-coverage genomes: improving accuracy and matrix occupancy for phylogenomics. Molecular Biology and Evolution 31:3081-3092. https://bitbucket.org/yangya/phylogenomic_dataset_construction/overview

Examples

# NOT RUN {
cut_long_internal_branches(tree_folder = "some/folder/containing/tree/files", tree_file_ending = ".mm", internal_branch_length_cutoff = 0.3, outdir = "some/other/folder/")
# }