1. Help
This file describes how to use the items in the input form.
1.1. Input data
- Template(s): Paste one or more template structures with sequence(s) in FASTA format and structure(s) in dot-bracket format.
- Template(s) file: Provide a file with one or more template structures with sequences in FASTA format and structure in dot-bracket format.
If both inputs are used, the templates in the file are attached to the pasted templates.
- Query sequence(s): Paste sequence(s) in FASTA format, for which structures will be predicted.
For single sequences, the FASTA header can be omitted.
- Query sequence(s) file: Provide a file with sequences in FASTA format, for which structures will be predicted.
If both inputs are used, the sequences in the file are attached to the the pasted sequences.
1.2. Parameters
- Use all templates: Check this option when multiple templates are pasted to predict structure of each query sequence using each template.
E.g., if 3 templates are used, each query sequence will have 3 structures predicted by the 3 templates.
Note, that if the templates are different structures, the predicted query structures will differ.
If this option is not checked when multiple templates are pasted, structures are predicted using the most similar template for each query sequence.
The most similar template is identified based on sequence similarity between templates and query sequences.
- Hairpin threshold; stem threshold:
Choose the thresholds of the percentage of incorrect base pairs in the individual hairpins and stems of the predicted structure formed during copying of the template structure into the predicted structure due to low evolutionary conservation.
Hairpins and stems are predicted de novo, if their percentage is higher than the thresholds.
If the thresholds are 0%, the copied hairpins and stems can not include any incorrect base pair, otherwise are predicted de novo.
Therefore only the perfectly copied hairpins and stems, i.e. those that are evolutionarily conserved and identical between the template structure and the predicted structure, will not be predicted de novo.
If the thresholds are 100%, no copied hairpin and/or stem, respectively, will be predicted de novo, as they can contain 100% of incorrect base pairs.
The predicted structure then will consist entirely of the base pairs copied from the template structure.
For details, please, refer Panek et al. (2017).
The default value of the thresholds is 30% for hairpins and 20% for stems.
- Compute z-score: Choose if z-scores of the predicted structures should be generated from either 10 or 100 random structures.
z-score reports a biological reliability of the predicted structures based on bootstrapping with a sample of structures generated from randomly shuffled query sequences.
Reliable structures have z-scores ≥ 2.
For details, please, refer Panek et al. (2017) and
https://en.wikipedia.org/wiki/Standard_score.
Computing with 100 random structures can be slow for long RNA sequences, but gives better estimates of z-scores than computing with 10 random structures.
Nevertheless, z-scores estimated from 10 random structures are exact enough for most RNA structures.
1.3. Warnings and error messages
If unsupported characters occur in either input sequences or structures, warnings or error messages are displayed prior the prediction.
- Warnings: the actions that cause a warning is issued do not stop the prediction.
The cause of warning is corrected and prediction goes on.
The causes are:
- Unsupported characters in sequences and structure records.
They are removed and if their removal does not result into different length of sequences and structure records, the prediction goes on.
It is an user responsibility to check upon the warning whether the removal does not damage the input sequence.
The allowed characters in both sequences and structures are defined by both FASTA definition and dot-bracket format.
As pseudoknotted structures are not supported in this version of cpPredict, only ordinary brackets in the structure records are allowed.
- If multiple templates are used and some of them are invalid from the reasons described below (in the Error messages section), but not all, a warning is displayed and the prediction continues with the remaining templates.
- Error messages: the actions, which cause an error message is issued, stop the prediction, as they can not be corrected automatically.
The causes are:
- everything that causes different length of sequences and structure records for templates;
- ambiguous nucleotide characters are not allowed in the template sequence (though they are allowed by FASTA definition) from technical reasons;
- unclosed bracket in the template structure.
1.4. Nucleotide substitutions in input data
Ambiguous nucleotide characters (R, Y, K, M, S, W, B, D, H, V, N) are substituted by the most probable nucleotide according to the whole sequence context.
A message entitled "Substitutions" is displayed in the results that shows the substituted nucleotides.