Reading assignment for Friday :

PSI BLAST

Why do PSI blast results depend on the chosen seed?

If the results are so questionable, why would one even want to use this?

Selection versus genetic drift.

Selection

Deterministic models to describe selection:  (diploid organisms, two alleles A1 and A2)

    codominance (kind of logistic equation) q=frequency of allele A2, 

Genotype:                                     A1A1        A1A2        A2A2

Relative number of offspring           1             1+s          1+2s

Fitness                                            w11          w12          w22

frequency                                      p^2             2pq          q^2

(pq: allele frequencies,=> genotype frequencies in Hardy Weinberg equilibrium)

Change in frequency (approximately):  
dq/dt= s* q*(1-q) and

q(t)=1/(1+((1-q0)/q0)*e-st)

    over dominance

Genotype:                                          A1A1    A1A2    A2A2

 Relative number of offspring                1         1+s1     1+s2

          s1>s:   balancing selection (try it)

Go to Kent Holsinger's collection of JAVA applets here and explore some of the time courses with different values of s1 and s2.  

Under which conditions of w11, w12, and w22 can one maintain both alleles over long periods of time?

Stochastic approaches -- random drift - neutral evolution:

Law of the gutter (see also Steven J Gould?s interpretation on the trend to increasing complexity)

Explore some simulations: 
     Drift only (vary the population size N),

How does the survival of multiple alleles in a population depend on the population size.

     Drift and Selection (interesting setting: P=0.01, N=50)

Note: Even though the allele conveys a strong selective advantage of 10%, the allele has a rather large chance to go extinct quickly.

     This simulation follows many populations (with the selected parameters) over time. It plots a histogram that shows how many of the populations have the allele frequency indicated on the y-axis. If you set the mutation rate to 0, this provides a nice illustration of the law of the gutter. (In the presence of the alleles converting back and forth, fixation does not occur.)

Mutation rate versus Substitution rate

The following assumes co-dominance or no selection:

s=0:  Probability of fixation, P, is equal to frequency of allele in population, q

mutation rate (per gene/per unit of time) = u ;  

frequency with which new alleles are generated in a diploid population size N equals to u*2N

Probability of fixation for each new allele = 1/(2N)

Substitution rate = frequency with which allele is generated * Probability of fixation= u*2N *1/(2N) = u

Therefore:
The substitution rate is independent of population size if s=0 and equal to the mutation rate!!!!

This is the reason that there is hope that the molecular clock might sometimes work.

For advantageous mutations: 
      Probability of fixation, P, is approximately equal to 2s;
      e.g., if selective advantage s = 1% then P = 2%

      Does this correspond to the simulations you performed above?

Fixation time

Neutral mutations:  tav=4*Ne generations 
(Ne=effective population size; For n discrete generations Ne= n/(1/N1+1/N2+?..1/Nn)

S unequal to 0:  tav= (2/s) ln (2N) generations  (also true for mutations with negative s --  How can this be??)

E.g.:  N=106, s=0:  average time to fixation: 4*106 generations

N=106, s=0.01:  average time to fixation: 2900 generations

 

 

Review: What is in a tree?

Trees are often used to depict the evolutionary history of organisms, species and molecules. (see slide)

  • Trees can be either rooted or unrooted (at least the ones calculated from molecular data :-)).
  • The assumption of a molecular clock is usually not justified a priori.
  • Gene tree - species tree - genealogy
  • Lineage sorting
  • Gene Duplications
  • HGT
  • Trees form molecular data are usually calculated as unrooted trees (at least they should be - if they are not this is usually a mistake). To root a tree you either can assume a molecular clock (substitutions occur at a constant rate, again this assumption is usually not warranted and needs to be tested), or you can use an outgroup (i.e. something that you know forms the deepest branch).

For example, to root a phylogeny of birds, you could use the homologous characters from a reptile as outgroup; to find the root in a tree depicting the relations between different human mitochondria, you could use the mitochondria from chimpanzees or from Neanderthals as an outgroup; to root a phylogeny of alpha hemoglobins you could use a beta hemoglobin sequence, or a myoglobin sequence as outgroup.

  • Trees have a branching pattern (also called the topology), and branch lengths. Often the branch lengths are ignored in depicting trees (these trees often are referred to as cladograms - note that cladograms should be considered rooted*).
    You can swap branches attached to a node, and you can depict the tree as rooted in any branch you like without changing the tree.

Tree exercise: Which of these trees are identical, when you consider them as unrooted and only consider the topology? here

While many trees have identical topologies, there is an enormous number of possible different tree topologies for rather small number of terminal taxa. An illustrative table is here.

IMPORTANT TERMS IN MOLECULAR EVOLUTION

Evolution of protein families:
      Homology (shared ancestry) versus  Analogy (convergent evolution)

Homology: Two sequences are homologous, if there existed an ancestral molecule in the past that is ancestral to both of the sequences

Homology is a "yes" or "no" character (don't know is also possible). Either sequences (or characters share ancestry or they don't (like pregnancy). Molecular biologist often use homology as synonymous with similarity of percent identity. One often reads: sequence A and B are 70% homologous. To an evolutionary biologist this sounds as wrong as 70% pregnant.

Types of Homology

Especially with respect to molecular evolution the following types of homology are really important!
(Especially the ones in bold. Yes, it will be in the final!):

Orthology: bifurcation in molecular tree reflects speciation
Paralogy: bifurcation in molecular tree reflects gene duplication
Xenology: gene was obtained by organism through horizontal transfer
Synology: genes ended up in one organism through fusion of lineages.

Orthologs: bifurcation in molecular tree reflects speciation. These are the molecules people interested in the taxonomic classification of organisms want to study.

Paralogs: bifurcation in molecular tree reflects gene duplication. The study of paralogs and their distribution in genomes provides clues on the way genomes evolved.
Gen and genome duplication have emerged as the most important pathway to molecular innovation, including the evolution of developmental pathways.

Xenologs: gene was obtained by organism through horizontal transfer. The classic example for Xenologs are antibiotic resistance genes, but the history of many other molecules also fits into this category: inteins, selfsplicing introns, transposable elements, ion pumps, other transporters,

Synologs: genes ended up in one organism through fusion of lineages. The paradigm are genes that were transferred into the eukaryotic cell together with the endosymbionts that evolved into mitochondria and plastids
(the -logs are often spelled with "ue" like in orthologues)

Discussion and examples from Fitch's article (TIG 2000, Fig. 1, see reading assignment). See also globin trees above.

How many different groups of homologous proteins are there?
Problems:  homology and detection of homology are two different things. 
Paradox (?): If all genes evolved through duplication and diversification from the same first self replicating RNA molecule, aren't all genes homologs?

At present there are about 500 known types of protein folds in the pdb data banks.  How many of these folds can be joined into a single class? 
(see the earlier example of
Helicase and F1-ATPase. Both form hexamers with something rotating in the middle (either the gamma subunit or the DNA; D. Crampton, pers. communication).   The monomers have the same type of nucleotide binding fold (picture), are they homologous? 

 

Intro to phylogenetic reconstruction

Phylogenetic analysis is an inference of evolutionary relationships between organisms.
Those relationships are usually represented by tree-like diagrams . Note: the assumption of tree-likeliness of evolution is controversial.

Steps of the phylogenetic analysis:


Compilation of sequence dataset
Alignment
Determination of substitution model
Tree building
Tree evaluation