PhyloFacts Sequence Search

Submit either an amino acid or a nucleotide sequence in FASTA format. The submitted sequence is classified to a protein family by Hidden Markov Model (HMM) scoring. Nucleotide sequences are first translated into all six frames and each frame is analyzed separately. Batch mode submission of up to five sequences is enabled. Results are returned by e-mail, and allow users to select families for more detailed classification of sequences to functional subfamilies based on scoring against subfamily HMMs.

PhyloFacts Sequence Search Results
Sequence Search Results

Use

Paste sequences in FASTA format. Sequences entered here will be used for searching the set of books and/or subfamilies in the model library. Sequences must be entered in FASTA format. Sequences may be either amino acids or nucleotides (DNA) -- to search using an Expressed Sequence Tag (EST), for example.

Upload FASTA file. Sequences in a file on your local computer may be used for searching the model library. The sequence file must be in FASTA format.

Nucleotide sequence. Check this box if your input sequence represents nucleotides (DNA). The search of the PhyloFacts library "books" will be conducted by first translating the nucleotide sequence to proteins. Six translations will be used: three forward and three reverse (that is, the reversed input sequence). Each forward and reverse translation is derived by offsetting the translation "frame" by zero, one, or two nucleotide positions. Thus, the search of the PhyloFacts library will be done using six different "protein sequences."

Send email to. Email will be sent to this address announcing completion of the search, and providing a URL link to the results.

Remember me. Your email address will be saved as a "cookie" within your current browser/computer if this box is checked. It will be used to automatically fill in your email address the next time you open this page from this browser and computer. Uncheck this box if you are not using your regular browser/computer.

Require protein families to match 60% or more of your sequence. Search results will be limited to those protein families which have a "global match" to your sequence. Global match is defined by a "bi-directional" coverage criterion that depends on the length of your sequence and the length of the protein family's HMM.

Bi-directional means that (1) the HMM coverage (the number of aligned characters between your sequence and the protein family hidden Markov model divided by the length of the protein family HMM) and (2) the sequence coverage (the number of aligned characters divided by the length of your sequence) both are at least as great as the criterion. In other words, bi-directional coverage means that the matching (aligned) regions between your sequence and the protein family are (1) a significant portion of the protein family consensus sequence or profile and (2) a significant portion of your sequence. length

The coverage criterion used varies depending on the length of the protein family HMM as follows:

      HMM     Coverage
    length   criterion
     <100       0.60
   100-199      0.65
     200+       0.70
     
   
   

Advanced

Advanced options include: