Comments on / Additions to Class 3: In searching Entrez, you can add links to online journals for which UConn has a subscription. (If you are outside UConn, you need to set up a proxy account for the links to work). The link to use is http://www.ncbi.nlm.nih.gov/sites/entrez?otool=uconnlib Use MyNCBI at Entrez for repeating searches in regular intervals (Alternative is Pubcrawler see class2). Web of Science Cited Reference Search - problems? Do example on clipboard and index. (use GI 2266989 (nucl) and 3334404 (prot)) Bottom lines: Outstanding questions: Honors conversion? Other web pages: Nucleic Acid Research Database Issue http://www.ebi.ac.uk/ http://rdp.cme.msu.edu/ http://www.jgi.doe.gov/ http://www.genomesonline.org/ http://www.tigr.org http://genome-www.stanford.edu/ http://www.flybase.org/ http://www.arabidopsis.org/ http://www.ensembl.org/
|
| Sequence and structure databanks can be divided into many different categories. One of the most important is: |
|
|
|
Old Powerpoint slides on data banks are here
In case you want more information on databanks, a computer sciences oriented intro to biological databanks, their design and management is at https://gcrcweba.lacusc.co.la.ca.us/bifresources/BuildingUsingBiolDB.pdf
| PRSS - (Note: we will discuss alignment algorithms later, for now it is sufficient to know that given a scoring matrix and two sequences, one can calculate an alignment that has an optimal score) One
way to quantify the similarity between two sequences is to 1.
compare the actual sequences and calculate alignment score 2.
randomize (scramble) one (or both) of the sequences and calculate the alignment
score for the randomized sequences. 3.
repeat step 2 at least 100 times 4.
describe distribution of randomized alignment scores 5.
do a statistical test to determine if the score obtained for the real sequences
is significantly better than the score for the randomized sequences To
illustrate the assessment of similarity/homology we will use a program from Pearson's
FASTA package called PRSS. A
There are many other alignment programs. BLAST is a program that is widely used and offered through the NCBI (go here for more info). It also offers to do pairwise comparisons (go here, do example). To force the blast program to report an alignment increase the E-value. Rules
of thumb: If you can demonstrate significant
similarity using either randomization or an unweighted blast search, your sequences
are homologous (i.e. related by common ancestry). Convergent
evolution has not been shown to lead to sequence similarities detectable by these
means (see above - this might not be true for scores in PSI-blast) If
the actual alignment score is more than three standard deviations (of the randomized
sequences) better than the mean for the randomized sequences, the two sequences
are homologous (i.e. related by common ancestry). PRSS and many other program
use more accurate distributions to describe the distribution of random hits.
The expectation value for the alignment-score of the actual sequences is based
on these statistics.
E-values
give the expected number of matches with an alignment score this good or better,
BUT:
Examples:
Jim Knox (MCB-UConn) has studied many
proteins involved in bacterial cell wall biosynthesis and antibiotic binding,
synthesis or destruction. Many of these proteins have identical 3-D structure,
and therefore can be assumed to be homologous, however, the above tests fail to
detect this homologies. (for example, enzymes with GRASP nucleotide binding sites
are depicted here.) DNA
replication involves many different enzymes. Some of the proteins do the same
thing in bacteria, archaea and eukaryotes; they have similar 3-D structures (e.g.:
sliding clamp, E. coli dnaN and eukaryotic PCNA, see Edgell and Doolittle,
Cell 89, 995-998), but again, the above tests fail to detect homology. |
Assignments: For Friday: go through the yellow
boxes in class4 (above). For Monday, Sept. 10:
Optional for Monday (Sept 10)(required for grad students :)) :
|
|
What does Bioinformatics have to do with Molecular Evolution?
The following chain although (believed to be) mainly determined by the DNA sequence (plus other components of the cell which in turn are encoded by other parts of the genome) can at present not be simulated in a computer. DNA sequence -> ... . Most scientists believe that the principle of reductionism (plus new laws and relations emerging on each level) is true for this chain; however, this is clearly "in principle" only. At several steps along the way from DNA to function our understanding of the chemical and physical processes involved is so incomplete that prediction of protein function based on only a single DNA sequence is at present impossible (at least for a protein of reasonable size). Solution:
Present day proteins evolved through substitution and selection from ancestral proteins. Related proteins have similar sequence AND similar structure AND similar function. In the above mantra "similar function" can refer to:
The following is based on observation and not on an a priori truth:
THE REVERSE IS NOT TRUE:
In particular, PROTEINS WITH SHARED ANCESTRY DO NOT ALWAYS SHOW SIGNIFICANT SIMILARITY |
|||||||