PhyloFacts::Structure Prediction

BPG home | PhyloFacts home | Sequence search | Database search | Browse | Publications | Help
PhyloBuilder | Gather homologs -> Align sequences -> Construct trees -> Find subfamilies -> Structure prediction

PhyloFacts Structure Prediction

PhyloFacts v. 2.0. 24 July 2008: 12,134 Domain Families, 921,491 HMMs

The PhyloFacts resource provides multiple sequence alignments, phylogenetic trees, and hidden Markov models for structural domain families.

This structure prediction HMM library is included in our Universal Proteome Phylogenomic Explorer, which you may want to use instead, for increased coverage of protein families.

This work is funded by grant number R01 HG002769 from the National Human Genome Research Institute of the NIH.


Protein Search

Submit sequences for classification against the HMM library. This library is designed to help biologists do the following:

  • Predict molecular function by phylogenomic analysis, using the phylogenetic tree for the family.
  • Classify novel sequences to functional subtypes, using the subfamily HMMs for the family.
  • Predict specificity positions, using the alignment analysis plots for each family.
Browse Books in our library

Each "book" in the HMM library corresponds to a structural domain, and contains the following data (generally downloadable, in different formats):

  • A cluster of homologs, typically from many species
  • One or more phylogenetic trees.
  • A decomposition of the tree into subtrees, to identify functional subfamilies.
  • A multiple sequence alignment for the family, as well as for individual subfamilies.
  • GO (Gene Ontology) annotations and evidence codes.
  • Other annotations and experimental data.
  • Hyperlinks to papers and online resources.
  • An analysis of the family's multiple sequence alignment using the subfamily decomposition to predict specificity positions defining the individual subtypes.
  • One or more solved structures, overlaid with subfamily-specific and family-wide conservation patterns for prediction of active site and other specificity positions .

Key References:

Phylogenomic inference and key methods

  • Sjölander, K., "Phylogenomic inference of protein molecular function: advances and challenges," Bioinformatics 2004 (20)2:170-179. Oxford University Press access.
  • Sjölander, K , "Phylogenetic inference in protein superfamilies: Analysis of SH2 domains," Proceedings of the Conference Intelligent Systems for Molecular Biology 1998 6:165-74. PubMed abstract. (Presents the BETE algorithm for protein subfamily identification)
  • Brown D, Krishnamurthy N, Dale J, Christopher W, and Sjölander K, "Subfamily HMMs in Functional Genomics", Proceedings of the Pacific Symposium on Biocomputing, 2005. PSB proceedings.

Structural and phylogenomic analyses of protein structural domains

  • Venter, C. et al, "The sequence of the human genome," Science, 2001 Feb 16;291(5507):1304-51. (Specific contributions: the algorithms used for the Panther HMM library construction and functional classification of the human genome. (1) FlowerPower clustering and alignment of homologs; (2) Bayesian Evolutionary Tree Estimation and subfamily identification; (3) Subfamily HMM construction.) Science Online access.

If you have any questions or comments, please email phylo.

Berkeley Phylogenomics Group Home