INTREPID belongs to the class of methods that predict functional residues using sequence information alone. Other methods that use only sequence information include Rate4Site (the core algorithm in the ConSurf webserver) and Evolutionary Trace. The basic idea underlying functional residue prediction based on evolutionary analysis is the observation that functionally important position tend to be more conserved across the protein family than the average position. INTREPID differs from other functional residue prediction methods in the way it uses the phylogenetic tree of the protein family. INTREPID's tree traversal allows the identification of functional residues that are conserved within the family as well as those that are conserved within a subfamily but vary across subfamilies. Hence, INTREPID can be used to analyze highly divergent protein families. INTREPID does not rely on expert knowledge about which types of residues are likely to play specific functional roles, so it can be used to address cases in which the specific functional role of a position may not be known. INTREPID does not require a 3D structure to predict functional residues, so it can be used when no homologous 3D structure exists. INTREPID does require that homologous sequences be present in public sequence databases, so it is not useful when no homologs can be found using standard database search techniques such as PSI-BLAST.
At this point, the INTREPID algorithm does not use experimental evidence or 3D structure to infer functional sites; INTREPID relies only on patterns of evolutionary conservation across the protein family phylogeny. For this reason, INTREPID can be used in cases that lack experimental evidence or solved 3D structures. INTREPID does make use of homologous 3D structures to display functional residue prediction, but this information is not used to make the predictions.
Functional residues are predicted using the INTREPID algorithm, which estimates the degree of evolutionary conservation for each residue in the query sequence. Evolutionary conservation is estimated by retrieving homologous sequences from the UniProt database, aligning these to the query sequence, and inferring a protein family tree. Patterns of conservation across the protein family are then used to estimate the degree of conservation at each position in the sequence.
The conservation score reported by INTREPID for each residue is a z-score. This score compares the observed conservation at residue i to the mean conservation across all sites in the sequence, and is the number of standard deviations away from the mean value. Larger positive values indicate highly conserved positions, relative to the rest of the sequence. Large negative values indicate positions that are less conserved than average.
For more details on the INTREPID algorithm, please see ( Sankararaman and Sjölander, Bioinformatics 2008).
The amount of time an analysis takes largely depends on the size and diversity of the protein family being analyzed. If your sequence belongs to a very large and diverse protein family, the analysis will take longer. A sequence with fewer homologs will return results in a much shorter period of time. As a rough guideline, a typical analysis may take an hour or more; an analysis of a very large protein family may take overnight.
The analysis pipeline has many complex stages, including gathering homologous sequences from the UniProt database, creating a multiple sequence alignment, inferring a protein family phylogeny, finding homologous 3D structures, and analyzing these data to infer patterns of evolutionary conservation. These types of phylogenomic analyses are time consuming, but they have been shown to produce reliable results.
At this time, our INTREPID server only accepts a single sequence as input. We have developed a complex analysis pipeline designed to produce the types of sequence alignments and protein family trees that will maximize the quality of functional residue prediction using INTREPID. Alignments and trees constructed primarily for other purposes (such as reconstructing a species phylogeny or analyzing protein family evolution across a small set of taxa) may not be suitable for INTREPID.
We report two results: one for the protein family in the PhyloFacts book page and the second for the query sequence submitted in the INTREPID analysis page. The results shown in the PhyloFacts book page are for the whole family (using a different algorithm for functional residue prediction) and may be less relevant to your analysis.
Prediction of functional residues using evolutionary conservation depends on finding enough homologous sequences to build a multiple sequence alignment and a phylogenetic tree. If fewer than 4 sequences are in a protein family, the analysis cannot be run. We use the query sequence to search the UniProt database for homologs using 4 iterations of PSI-BLAST; if this search fails to retrieve at least 3 unique homologs of the query, the analysis cannot proceed.