The alignment must be produced by aligning sequence to a reference with introns masked by 'n's, which results in gaps in the rest of the sequences. This function replaces those gaps with 'n's and removes the reference sequence.

fill_introns(alignment, ref_pattern, outgroup = NULL,
  trim_outgroup = FALSE)

Arguments

alignment

matrix of class DNAbin

ref_pattern

Pattern used for matching with grep to identify reference sequences.

outgroup

Character vector; names of outgroup sequences.

trim_outgroup

Logical; should the outgroups be trimmed from the alignment?

Value

Matrix of class DNAbin

Examples

library(ape) data(woodmouse) # Make reference sequence with a 50bp intron (a string of 'n's) # in the middle. woodmouse_ref <- as.character(woodmouse[1,]) woodmouse_ref <- c(woodmouse_ref[1:400], rep("n", 50), woodmouse_ref[401:length(woodmouse_ref)]) woodmouse_ref <- as.DNAbin(woodmouse_ref) # Align with other sequences that don't include the intron. # (need to convert to list first) woodmouse_ref <- as.list(woodmouse_ref) names(woodmouse_ref) <- "ref" woodmouse <- as.list(woodmouse) woodmouse_with_introns <- ips::mafft( c(woodmouse, woodmouse_ref), path = "/usr/bin/mafft") # Image of the alignment shows that 'ref' has 'n's at positions 400-450, # while other sequences have gaps ('-'). image(woodmouse_with_introns)
# Fill-in introns woodmouse_masked <- fill_introns( woodmouse_with_introns, ref_pattern = "ref" )
#> #> 0 rows deleted from alignment #> 0 columns deleted from alignment
# After filling-in, the reference sequence is gone and # all introns are 'n's. image(woodmouse_masked)