Bioinformatics Seminar

An occasional seminar series on bioinformatics, at Berkeley and beyond!

Title: Large-Scale Clustering of Protein Sequences - Detection of Functionally Related Proteins with Transitivity Clustering

Speaker: Dr. Jan Baumbach, International Computer Science Institute, University of California at Berkeley

Time: 12:00-1:00pm, Tuesday, June 9th, 2009

Place: 321 Stanley Hall

Abstract:
Partitioning biomedical data objects into groups, such that the objects in each group share common traits, is a long-standing challenge in computational biology. Here we present an integrated data clustering framework based on weighted transitive graph projection: Transitivity Clustering. We illustrate a typical, biomedical clustering task that starts with a list of amino acid sequences, investigates similarity functions and parameter estimation problems, and finally deals with an integrated result interpretation; all of which can be done easily with TransClust, our Transitivity Clustering implementation, but with no other clustering software. Exemplarily, we reconstruct families of functionally related proteins. In a large-scale study, we compute the core genome for al 51 sequenced actinobacteria. We also present a whole-genome based phylogenetic tree for all organisms of this phylum. Project web site: http://transclust.cebitec.unibielefeld.de

In the talk, we very briefly motivate the necessity of protein sequence clustering approaches as an essential part of inter-species gene regulatory network transfer workflows. We will also discuss our previous work on weighted transitive graph projection. Here, we mainly concentrate on the FORCE heuristic.

— Some background literature (PubMed IDs):



To receive announcements of future seminars in this series, please subscribe to the bioinformatics mailing list at https://calmail.berkeley.edu.

Return to the Berkeley Phylogenomics Group homepage