Given the output from the mcl clustering algorthim and a concatenated fasta file including all sequences used for clustering, outputs one fasta file per cluster including the sequences in that cluster.

write_fasta_files_from_mcl(path_to_ys = pkgconfig::get_config("baitfindR::path_to_ys"),
  all_fasta, mcl_outfile, minimal_taxa = 4, outdir, overwrite = FALSE,
  get_hash = TRUE, echo = pkgconfig::get_config("baitfindR::echo",
  fallback = FALSE), ...)

Arguments

path_to_ys

Character vector of length one; the path to the folder containing Y&S python scripts, e.g., "/Users/me/apps/phylogenomic_dataset_construction/"

all_fasta

Character vector of length one; the path to the fasta file including all query sequences concatenated together, i.e., the fasta file used to create the "all-by-all" blast database.

mcl_outfile

Character vector of length one; the path to the output from running mcl on blast distances.

minimal_taxa

Numeric; minimal number of taxa required to be present for the cluster to be written. Default 4, the minimum number of taxa needed for an un-rooted tree.

outdir

Character vector of length one; the path to the folder where the clusters should be written.

overwrite

Logical; should previous output of this command be erased so new output can be written? Once erased it cannot be restored, so use with caution!

get_hash

Logical; should the 32-byte MD5 hash be computed for all clusters concatenated together? Used for by drake_plan for tracking during workflows. If TRUE, this function will return the hash.

echo

Logical; should the standard output and error be printed to the screen?

...

Other arguments. Not used by this function, but meant to be used by drake_plan for tracking during workflows.

Value

One fasta file per cluster (cluster1.fa, cluster2.fa, etc.) will be written to outdir. If get_hash is TRUE, the 32-byte MD5 hash be computed for all .fa files concatenated together will be returned.

Details

Wrapper for Yang and Smith (2014) write_fasta_files_from_mcl.py

References

Yang, Y. and S.A. Smith. 2014. Orthology inference in non-model organisms using transcriptomes and low-coverage genomes: improving accuracy and matrix occupancy for phylogenomics. Molecular Biology and Evolution 31:3081-3092. https://bitbucket.org/yangya/phylogenomic_dataset_construction/overview

Examples

# NOT RUN {
write_fasta_files_from_mcl(all_fasta = "some/folder/all.fasta", mcl_outfile = "some/folder/hit-frac0.4_I1.4_e5", minimal_taxa = 5, outdir = "some/folder")
# }