Protein families evolve a multiplicity of functions and structures through gene duplication, domain shuffling, speciation and other processes. As numerous studies have shown, function prediction by homology is associated with systematic errors on these data. Phylogenomic analysis, combining phylogenetic tree construction, integration of experimental data, and differentiation of orthologs and paralogs, has been shown to address these errors and improve the accuracy of functional classification. The explicit integration of structure prediction and analysis in this framework, which we call structural phylogenomics, provides additional insights into protein superfamily evolution, and improves function prediction accuracy.
The Berkeley Phylogenomics Group has developed a phylogenomic resource for investigators in innate immunity, the Innate Immunity Phylogenomic Explorer. This library of pre-computed protein superfamily phylogenies spans 52 families, and includes over 1,000 family and subfamily hidden Markov models for classification of novel sequences.
The flagship resource produced by our lab is the Universal Proteome Explorer, with almost 10K books for protein families and domains and over 700K hidden Markov models (HMMs). Each book contains a multiple sequence alignment of homologs, one or more phylogenetic trees, Gene Ontology (GO) annotations and evidence codes, PFAM domains, cellular localization and predicted (or known) 3D structures. We also predict functional subfamilies and critical residues, and plot predicted key residues on the 3D structures. Each book contains an HMM for the family as a whole and for individual subfamilies. These phylogenomic HMM libraries are freely available for classification of novel sequences to functional families and subfamilies; both DNA and protein sequences can be submitted for analysis.
Sjölander, K., "Phylogenomic inference of protein molecular function: advances and challenges," Bioinformatics 2004 (20)2:170-179. Oxford University Press access. PDF.
Duncan Brown and Kimmen Sjölander, "Functional Classification using Phylogenomic Inference." PLoS Computational Biology, Vol 2, Issue 6, June 2006.
Mahmut Tör, M., Brown D., Cooper, A., Woods-Tör, A., Sjölander, K., Jones, J.D.G. and Holub, E. "Arabidopsis Downy Mildew Resistance Gene RPP27 Encodes a Receptor-Like Protein Similar to CLAVATA2 and Tomato Cf-9," Plant Physiology, 2004 Jun;135(2):1100-12. PubMed. PDF.
Kleffmann, T., Russenberger, D., von Zychlinski, A., Christopher, W., Sjölander, K., Gruissem, W., and Baginsky, S. "The Arabidopsis thaliana chloroplast proteome reveals pathway abundance and novel protein functions," Current Biology 2004 Mar 9;14(5):354-62. PDF. PubMed.
Magnani, E., Sjölander, K. , and Hake S., "From endonucleases to transcription factors: evolution of the AP2 DNA-binding domain in plants", Plant Cell, 2004 Sep;16(9):2265-77.PubMed. Selected by the Faculty of 1000 as a "Must Read".
Rebecca Middleton, Kimmen Sjölander, Nandini Krishnamurthy, Jonathan Foley, and Patricia Zambryski, "Predicted hexameric structure of the Agrobacterium VirB4 C terminus suggests VirB4 acts as a docking site during type IV secretion", Proceedings of the National Academy of Sciences 2005 Feb 1;102(5):1685-90.
Stephen T. Chisholm, Douglas Dahlbeck, Nandini Krishnamurthy, Brad Day, Kimmen Sjölander, and Brian J. Staskawicz, "Molecular characterization of proteolytic cleavage sites of the Pseudomonas syringae effector AvrRpt2", Proceedings of the National Academy of Sciences, February 8, 2005, vol. 102, no. 6, 2087-2092.
Lillian Fritz-Laylin, Nandini Krishamurthy, Mahmut Tor, Kimmen Sjölander and Jonathan Jones. "Phylogenomic analysis of the receptor-like proteins of rice and Arabidopsis", Plant Physiology, June 2005, Vol. 138, pp. 611-623. PDF.
von Zychlinski A, Kleffmann T, Krishnamurthy N, Sjölander K, Baginsky S, Gruissem W., "Proteome analysis of the rice etioplast: metabolic and regulatory networks and novel protein functions", Mol Cell Proteomics. PubMed. 2005 May 20 (Epub ahead of print). PDF.