|
What does Bioinformatics have to do with Molecular Evolution?
The following chain although (believed to be) mainly determined by the DNA sequence (plus other components of the cell which in turn are encoded by other parts of the genome) can at present not be simulated in a computer. DNA sequence -> ... . Most scientists believe that the principle of reductionism (plus new laws and relations emerging on each level) is true for this chain; however, this is clearly "in principle" only. At several steps along the way from DNA to function our understanding of the chemical and physical processes involved is so incomplete that prediction of protein function based on only a single DNA sequence is at present impossible (at least for a protein of reasonable size). Solution:
Present day proteins evolved through substitution and selection from ancestral proteins. Related proteins have similar sequence AND similar structure AND similar function. In the above mantra "similar function" can refer to:
The following is based on observation and not on an a priori truth:
THE REVERSE IS NOT TRUE:
In particular, PROTEINS WITH SHARED ANCESTRY DO NOT ALWAYS SHOW SIGNIFICANT SIMILARITY |
|||||||
| If you can demonstrate significant
similarity using randomization , your sequences
are homologous (i.e. related by common ancestry). Convergent
evolution has not been shown to lead to sequence similarities detectable by these
means (see above - this might not be true for scores in PSI-blast) Summary of Terminology: E-values give the expected number of matches with an alignment score this good or better due to chance alone (no shared ancestry, no cnvergent evolution) P-values give the probability of to find a match of this quality or better due to chance alone (no shared ancestry, no convergent evolution). P values are [0,1], E-values are [0,infinity). BUT:
Examples:
Jim Knox (MCB-UConn) has studied many
proteins involved in bacterial cell wall biosynthesis and antibiotic binding,
synthesis or destruction. Many of these proteins have identical 3-D structure,
and therefore can be assumed to be homologous, however, the above tests fail to
detect this homologies. (for example, enzymes with GRASP nucleotide binding sites
are depicted here.) DNA
replication involves many different enzymes. Some of the proteins do the same
thing in bacteria, archaea and eukaryotes; they have similar 3-D structures (e.g.:
sliding clamp, E. coli dnaN and eukaryotic PCNA, see Edgell and Doolittle,
Cell 89, 995-998), but again, the above tests fail to detect homology. |
| Powerpoint Slides on homology and protein space are here |
If time discuss exponential functions? (Figs. 1, 2, 3) (More data at the GOLD database here)
If plenty of time go over Kezdy Swinebourne plot here |
Assignments: For
Wednesday Sept. 13: Start work on the take-home quiz. For Friday:
|