Bioinformatics Seminar

An occasional seminar series on bioinformatics, at Berkeley and beyond!

Title: Theoretical and Practical Results on the Aligning Alignments Problem

Speaker: Dr. Dean Starrett, Dept. of Computer Science, University of Arizona

Time: 1:00-2:00pm, Monday, February 2nd, 2009

Place: 321 Stanley Hall

Abstract:
Constructing high-quality multiple alignments is computationally hard, both in theory and in practice, and is typically done using heuristic methods. The majority of state-of-the-art multiple alignment programs employ a form and polish strategy, where in the construction phase an initial multiple alignment is formed by progressively merging smaller alignments, starting with single sequences. Then in a local-search phase, the resulting alignment is polished by repeatedly splitting it into smaller alignments and re-merging.

This merging of two multiple alignments is called the Aligning Alignments Problem. The merge is done by treating alignment columns as units and aligning the columns of one alignment with the columns of the other. The problem may seem a simple extension of standard two-sequence alignment since it consists of aligning a pair of sequences (of columns). However, under the commonly used sum-of-pairs objective for scoring multiple alignments, we prove the problem is NP-complete when affine gap costs are used and the gaps are scored precisely (counted exactly). Interestingly, this form of multiple alignment is polynomial-time solvable when we relax the exact count, showing that exact gap counts themselves are inherently hard in multiple sequence alignment.

Unlike general multiple alignment, we are able to show that Aligning Alignments is tractable in practice, even with exact affine gap costs. Our software AlignAlign is both time- and space-efficient on biological data. Computational experiments show instances derived from two standard biological benchmark suites can be optimally aligned with surprising efficiency, and experiments on simulated data show the time and space both scale well with instance size.



To receive announcements of future seminars in this series, please subscribe to the bioinformatics mailing list at https://calmail.berkeley.edu.

Return to the Berkeley Phylogenomics Group homepage