The alignment must be produced by aligning sequence to a reference with introns masked by 'n's, which results in gaps in the rest of the sequences. This function replaces those gaps with 'n's and removes the reference sequence.
fill_introns(alignment, ref_pattern, outgroup = NULL, trim_outgroup = FALSE)
| alignment | matrix of class DNAbin |
|---|---|
| ref_pattern | Pattern used for matching with grep to identify reference sequences. |
| outgroup | Character vector; names of outgroup sequences. |
| trim_outgroup | Logical; should the outgroups be trimmed from the alignment? |
Matrix of class DNAbin
library(ape) data(woodmouse) # Make reference sequence with a 50bp intron (a string of 'n's) # in the middle. woodmouse_ref <- as.character(woodmouse[1,]) woodmouse_ref <- c(woodmouse_ref[1:400], rep("n", 50), woodmouse_ref[401:length(woodmouse_ref)]) woodmouse_ref <- as.DNAbin(woodmouse_ref) # Align with other sequences that don't include the intron. # (need to convert to list first) woodmouse_ref <- as.list(woodmouse_ref) names(woodmouse_ref) <- "ref" woodmouse <- as.list(woodmouse) woodmouse_with_introns <- ips::mafft( c(woodmouse, woodmouse_ref), path = "/usr/bin/mafft") # Image of the alignment shows that 'ref' has 'n's at positions 400-450, # while other sequences have gaps ('-'). image(woodmouse_with_introns)# Fill-in introns woodmouse_masked <- fill_introns( woodmouse_with_introns, ref_pattern = "ref" )#> #> 0 rows deleted from alignment #> 0 columns deleted from alignment# After filling-in, the reference sequence is gone and # all introns are 'n's. image(woodmouse_masked)