BPG Code Repository (GitHub repository)
INTREPID (GitHub repository)
INTREPID is a method for functional site identification that exploits the information in large diverse multiple sequence alignments.
Sankararaman, S. and Sjölander, K., "INTREPID - INformation-theoretic TREe traversal for Protein functional site IDentification", Bioinformatics 2008; doi: 10.1093/bioinformatics/btn474
Discern (.tar.gz download)
Discern is a method for catalytic residue prediction derived from the combination of three ingredients: the use of the INTREPID phylogenomic method to extract conservation information; the use of 3D structure data, including features computed for residues that are proximal in the structure; and a statistical regularization procedure to prevent overfitting.
Sankararaman, S., Sha, F., Kirsch, J.F., Jordan, M.I., and Sjölander, K., Active Site Prediction using Evolutionary and Structural Information", Bioinformatics 2010; doi: 10.1093/bioinformatics/btq008
SATCHMO (.tar.gz download)
SATCHMO (Simultaneous Alignment and Tree Construction using Hidden Markov Models) uses HMM-HMM scoring and alignment to simultaneously estimate a phylogenetic tree and a multiple sequence alignment (MSA).
Edgar, R., and Sjölander, K., SATCHMO: Sequence Alignment and Tree Construction using Hidden Markov models", Bioinformatics. 2003 Jul 22; 19(11):1404-11.
SATCHMO-JS (.tar.gz download)
SATCHMO-JS is a variant of the SATCHMO algorithm designed for scalability to large datasets. We use a jump-start (JS) protocol to reduce complexity, employing computationally efficient MSA methods for subgroups of closely related sequences and saving the computationally expensive HMM-HMM scoring and alignment to estimate the tree and MSA between more distantly related subgroups. SATCHMO-JS requires the folowing to be installed: SATCHMO, MAFFT, RAxML, BioPython.
Hagopian, R., Davidson, J., Datta, R., Samad, B., Jarvis, G., and Sjölander, K. "SATCHMO-JS: a webserver for simultaneous protein multiple sequence alignment and phylogenetic tree construction", Nucleic Acids Research 2010, doi:10.1093/nar/gkq298
SCI-PHY version 1 (.tar.gz download)
The original implementation of the SCI-PHY algorithm that builds subfamily HMMs. However, the subfamily HMMs are in UCSC SAM format.
SCI-PHY version 3 (.tar.gz download)
The current version of the SCI-PHY code that returns a subfam file showing the subfamily multiple sequence alignments. Subfamily HMM files can be created using HMMER.
FlowerPower (.tar.gz download)
FlowerPower is a clustering algorithm designed for the identification of global homologs. It employs an iterative approach to clustering sequences. However, rather than using a single HMM or profile to expand the cluster, FlowerPower identifies subfamilies using the SCI-PHY algorithm and then selects and aligns new homologs using subfamily hidden Markov models. FlowerPower is shown to outperform BLAST, PSI-BLAST and the UCSC SAM-Target 2K methods at discrimination between proteins in the same domain architecture class and those having different overall domain structures.
Krishnamurthy, N., Brown, D. and Sjölander, K., FlowerPower: clustering proteins into domain architecture classes for phylogenomic inference of protein function", BMC Evolutionary Biology 2007, 7 Suppl 1:S12 doi:10.1186/1471-2148-7-S1-S12