|
Accurate multiple sequence alignments are a prerequisite to tree building, Hidden Markov Model construction, conservation analysis, homology model construction, and many other probabilistic approaches to evolutionary analysis.
The construction of PhyloFacts protein families begins with the clustering and alignment of homologous proteins. That alignment is then used for tree construction, SCI-PHY subfamily prediction, key residue prediction, and other downstream forms of analysis that appear in the completed PhyloFacts protein family "book". Because each of these downstream analyses is sensitive alignment errors, ensuring alignment accuracy has been an active area of research.
Alignment accuracy in the PhyloFacts protein families occurs in a two step process. First, a cluster of homologs is assembled in a way that carefully screens out any spurious hits that might disrupt an alignment. This is most often done by aligning global homologs to subfamily HMM's using FlowerPower. Once a reliable cluster of sequences has been culled, the sequences are realigned with the MUSCLE algorithm.
|