PhyloFacts Genome: Dictyoglomus turgidum

Source: PhyloFacts release PF3.0.2. Statistics updated on December 7, 2011

Number of genes: 1743
Number of books for multi-domain architectures: 832
Number of books for Pfam domains: 4521
Number of genes in multi-domain architecture books: 539 (30.9%)
Number of genes in books for Pfam domains: 1372 (78.7%)
Number of genes in any book: 1430 (82.0%)

Reference proteome used for generating statistics: Quest For Orthologs, Release 2011_04
Copy of the reference fasta file: download

Return to the PF3.0 genome coverage page.
We provide a number of downloads from this page for PhyloFacts families containing Dictyoglomus turgidum proteins. Gzipped (gz) files are provided; Windows users can extract these files using the free utility 7-zip.

For families agreeing on the multi-domain architecture:
Hidden Markov models (HMMs, in HMMER3 format) (16.06MB, gz)
Multiple sequence alignments (in UCSC A2M format) (35.38MB, gz)
Phylogenetic Trees (in Newick format) : Maximum Likelihood (4.46MB, gz), Neighbor joining (4.08MB, gz)
Family summary data (in machine-readable format) (2.09MB, gz)

For Pfam domain families:
Hidden Markov models (HMMs, in HMMER3 format) for domain books (23.76MB, gz)
Multiple sequence alignments (in UCSC A2M format) for domain books (159.78MB, gz)
Phylogenetic Trees (in Newick format): Maximum Likelihood (29.57MB, gz), Neighbor Joining (27.22MB, gz)
Family summary data (in machine-readable format) (10.07MB, gz)



To download files pertaining to an individual PhyloFacts 3.0 family, please visit the corresponding family webpage from here.

Scripts to extract data (HMMs, phylogenetic trees, etc.) for specific families.

Detailed usage instructions for each script can be obtained by running the script without any arguments or in the README file below. The following scripts are included in the zip file (download) :

README
Instructions for using the scripts
extract_protein_info.py
Obtain a list of PhyloFacts families for a given input protein.
extact_summary.py
Extracts summary data for PhyloFacts families. Input arguments are either a single bpg accession or a file containing a list of bpg accessions.
extract_tree.py
Extracts trees for PhyloFacts families. Input arguments are either a single bpg accession or a file containing a list of bpg accessions.
extact_msa.py
Extracts msa (Multiple Sequence Alignment) for PhyloFacts families. Input arguments are either a single bpg accession or a file containing a list of bpg accessions.
extract_hmm.py
Extracts hmm (Hidden Markov Models) for PhyloFacts families. Input arguments are either a single bpg accession or a file containing a list of bpg accessions.

For detailed information on usage, please refer to the README file included in the zip file.

The PhyloFacts Encyclopedia is composed of "books" to represent gene families, clustered in two distinct ways: requiring agreement along the entire multi-domain architecture, and based on sharing a single Pfam domain. Each book contains a multiple sequence alignment, phylogenetic tree, inferred orthologs (using the PHOG algorithm (Datta et al., Nucleic Acids Research, 2009)), hidden Markov model, and associated experimental and annotation data.

Details on the library construction pipeline are provided in the following article.
Nandini Krishnamurthy, Duncan Brown, Dan Kirshner and Kimmen Sjölander, "PhyloFacts: An online structural phylogenomic encyclopedia for protein functional and structural classification," Genome Biology 2006, 7:R83 PDF.