Assignments for Friday
Assignments for Monday
Vote on midterm date
Discuss Takehome exam #2
1. Given that two homologous sequences start off with 100% similarity and then diverge over time, what percent similarity will they share when saturation has been reached, (assume equal frequency for the different letters)
a. For nucleotide sequences?
There are four different nucleotides. If the sequence is saturated with substitutions, than the initial nucleotide has an equal probability (if the nucleotides occur with the same frequency) to be one of the 4 possibilities, one of them is a match, resulting in 25% identity.
b. For protein sequences?
Same but there are 20 letters, resulting in a 5% match probability
Bonus question: How would the result for a nucleotide sequence change, if the frequencies for the two nucleotides are not equal. Use composition with 40%G 40%C and 10%A, 10%T as an example.
The chance of a match for a T is equal to the probability to have T at the start, and to have a T at the end, which after saaturation with substitutions is both equal to the frequency of T, i.e. chance to have T at the beginning and at the end is 0.1 time 0.1. Similar for the other nucleotides. The total probability to have a match thus is 0.1^2 + 0.1^2 + 0.4^2 + 0.4^2 = 2*.1^2 +2*.4^2=.02 + .32 = .34 = 34% , i.e. more similar than in case of equal nucleotide frequency
General formular: %identity for random sequences with biased composition: (frequency of A)^2 + (frequency of T)^2 + (frequency of G)^2 + (frequency of C)^2 . For a genome the total number of G is equal to the total number of C, thus one alos could write: expected %identity = 2*((%GC/100)^2) + 2*(((100-%GC)/100)^2)
2d. When did the Bacteria diverge from the Archaea and Eukaryotes, i.e. how old is LUCA (approximately)? Current estimates vary between 4.2 and about 3 billion years BP (compare class 3, and here and here)
3. What is the late heavy bombardment? See http://en.wikipedia.org/wiki/Late_Heavy_Bombardment. Did this sterilize Earth? Did it happen, or was this just the tail of the early heavy bombardment?
4. Which type of sequences can be used to look further back in time, nucleotide or protein? Give a short justification of your reasoning. This question does not ask about the RNA world (or at least it did not intend to), but referred to the fact that proteins are easier to align between divergent copies, and that therefore one can determine their homology over much longer periods of time.
9 The finding that the ribosomal RNA alone cannot perform translation is an argument against the RNA world hypothesis FALSE It turns out that the ribosome at its catalytic core is made from RNA. Ribosomal proteins are not at the center of the peptide bond formation, which suggests that the ribosomal peptide synthesis started out as a ribosome based machinery.
15 When inteins first begin to decay they lose the protein-binding domain first, while the DNA binding domain must stay functional or it will destroy the function of the host proteins. FALSE This sentence is a combination of a sentence describing addiction causing systems (restriction endonucleases and the corresponding methylases that protect the DNA -- The former are more stable as proteins, thus is both genes are deleted, the cell starts digesting its own DNA...) and inteins. In inteins it is the other way around. the protein splicing domain allows the host protein to be functional, whereas the endonuclease domain is frequently lost (as in the small intein we looked at in computerlab #3.
24. Sequences that do not show significant similarity
A) are not homologous
B) are homologs
C) might never-the-less be homologs