Fill-in introns in an alignment

The alignment must be produced by aligning sequence to a reference with introns masked by 'n's, which results in gaps in the rest of the sequences. This function replaces those gaps with 'n's and removes the reference sequence.

fill_introns(alignment, ref_pattern, outgroup = NULL,
  trim_outgroup = FALSE)

Arguments

alignment	matrix of class DNAbin
ref_pattern	Pattern used for matching with grep to identify reference sequences.
outgroup	Character vector; names of outgroup sequences.
trim_outgroup	Logical; should the outgroups be trimmed from the alignment?

Value

Matrix of class DNAbin

Examples

library(ape)
data(woodmouse)

# Make reference sequence with a 50bp intron (a string of 'n's)
# in the middle.
woodmouse_ref <- as.character(woodmouse[1,])
woodmouse_ref <- c(woodmouse_ref[1:400],
  rep("n", 50),
  woodmouse_ref[401:length(woodmouse_ref)])
woodmouse_ref <- as.DNAbin(woodmouse_ref)

# Align with other sequences that don't include the intron.
# (need to convert to list first)
woodmouse_ref <- as.list(woodmouse_ref)
names(woodmouse_ref) <- "ref"
woodmouse <- as.list(woodmouse)
woodmouse_with_introns <- ips::mafft(
  c(woodmouse, woodmouse_ref),
  path = "/usr/bin/mafft")

# Image of the alignment shows that 'ref' has 'n's at positions 400-450,
# while other sequences have gaps ('-').
image(woodmouse_with_introns)

# Fill-in introns
woodmouse_masked <- fill_introns(
  woodmouse_with_introns,
  ref_pattern = "ref"
)
#> 
#> 	 0 rows deleted from alignment
#> 	 0 columns deleted from alignment

# After filling-in, the reference sequence is gone and
# all introns are 'n's.
image(woodmouse_masked)

Arguments

Value

Examples

Contents