PhyloFacts::MSA
Berkeley Phylogenomics multiple sequence alignment resources

BPG home | PhyloFacts home | Sequence search | Database search | Browse | Publications | Help
PhyloBuilder | Gather homologs -> Align sequences -> Construct trees -> Find subfamilies -> Structure prediction

Introduction

Accurate multiple sequence alignments are a prerequisite to tree building, Hidden Markov Model construction, conservation analysis, homology model construction, and many other probabilistic approaches to evolutionary analysis.

The construction of PhyloFacts protein families begins with the clustering and alignment of homologous proteins. That alignment is then used for tree construction, SCI-PHY subfamily prediction, key residue prediction, and other downstream forms of analysis that appear in the completed PhyloFacts protein family "book". Because each of these downstream analyses is sensitive alignment errors, ensuring alignment accuracy has been an active area of research.

Alignment accuracy in the PhyloFacts protein families occurs in a two step process. First, a cluster of homologs is assembled in a way that carefully screens out any spurious hits that might disrupt an alignment. This is most often done by aligning global homologs to subfamily HMM's using FlowerPower. Once a reliable cluster of sequences has been culled, the sequences are realigned with the MUSCLE algorithm.

Phylofacts: Alignment Webservers

FLOWERPOWER clustering by iterative subfamily HMM construction.
MUSCLE MUltiple Sequence Comparison by Log Expectation.
SATCHMO Simultaneous Alignment and Tree Construction using Hidden Markov mOdels.

Comments, questions? Email <phylo@phylogenomics.berkeley.edu>.

This page was inspired by the excellent PFAM resource at Washington University, St. Louis