Speaker: Dr. Dean Starrett, Dept. of Computer Science, University of Arizona
Time: 1:00-2:00pm, Monday, February 2nd, 2009
Place: 321 Stanley Hall
Abstract:
Constructing high-quality multiple alignments is computationally hard, both in
theory and in practice, and is typically done using heuristic methods. The
majority of state-of-the-art multiple alignment programs employ a form and
polish strategy, where in the construction phase an initial multiple alignment
is formed by progressively merging smaller alignments, starting with single
sequences. Then in a local-search phase, the resulting alignment is polished
by repeatedly splitting it into smaller alignments and re-merging.
This merging of two multiple alignments is called the Aligning Alignments Problem. The merge is done by treating alignment columns as units and aligning the columns of one alignment with the columns of the other. The problem may seem a simple extension of standard two-sequence alignment since it consists of aligning a pair of sequences (of columns). However, under the commonly used sum-of-pairs objective for scoring multiple alignments, we prove the problem is NP-complete when affine gap costs are used and the gaps are scored precisely (counted exactly). Interestingly, this form of multiple alignment is polynomial-time solvable when we relax the exact count, showing that exact gap counts themselves are inherently hard in multiple sequence alignment.
Unlike general multiple alignment, we are able to show that Aligning Alignments is tractable in practice, even with exact affine gap costs. Our software AlignAlign is both time- and space-efficient on biological data. Computational experiments show instances derived from two standard biological benchmark suites can be optimally aligned with surprising efficiency, and experiments on simulated data show the time and space both scale well with instance size.
To receive announcements of future seminars in this series, please subscribe to the bioinformatics mailing list at https://calmail.berkeley.edu.