Reading assignment for Friday: chapter 11
Draw a diagram of the genome rearrangements that relate the Mycobacterium tuberculosis CDC1551 genome to the genome of Mycobacterium avium subsp. paratuberculosis K-10 (see http://www.ncbi.nlm.nih.gov/sutils/geneplot.cgi?tax1=83331&tax2=262316 )
Go over results from class 20.
Do over results from gene plot exercise: Borrelia burgdorferi vs B. garinii at NCBI and at EMU (import and analyze in Excel here)
How are the approaches different? What would be preferred to do?
From:<http://dml.cmnh.org/2002Jul/msg00351.html> ----- Original Message ----- > > --+--+-----------A This _is_ a Hennigian comb, because in a cladogram, _only_ topology counts. --+--F ... what a side branch is lies completely in the hand of the presentator. * References: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| The Clay of Evolution - How to study genes and genomes. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| How
can genes get duplicated: Whole genome duplication: frequent event in plants, also speculated to have occurred at least twice in the early evolution of vertebrates. 15% of the yeast genome is present in duplicated form, the currently accepted idea is that there was an ancient duplication followed by rearrangement and gene loss. The idea of genome duplications in early vertebrate evolution has become very popular, but phylogeny of regulatory proteins does not support this idea (see here and here for pro and here for contra). The picture below is a comparison of the Yeast proteom with itself (the diagonal is removed). It clearly shows many small regions of duplications.
The diagram depicts the result of a BLAST search of each ORF in a genome against the genome (=collection of ORFs). The proteins encoded in the genome are listed in order on both axes (could be different genomes as well, see below). The color of each dot reflects the E-value for the comparison of the ORFs. The smaller the E-value, the lighter the point. Parts of chromosomes get duplicated: traces of this are seen in Arabidopsis and Caenorhabditis Single genes get duplicated -> gene families originally tandemly replicated (see the Caeonrhapditis paper above) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Some TOOLS at NCBI The NCBI provides several different interfaces to browse through and analyze genomes. For example, in the Borrelia genome, if you click on the complete genome, you get a graphical representation, further clicks move you down throw several levels to the nucleotide and encoded amino acid sequence. If you click on an ORF, you retrieve the sequence followed by an output of a blast search of this sequence against the nr database. The graphic representation shows you which part of the ORF generated the match, if you click on the number that represents the score, you open a new window with the alignment (again with nice graphics included). If you click on the number an window with the matching sequence in gb-format opens up. If the ORF is part of a cluster of putatively orthologous genes, you can get information on the cluster by clicking on the COGnumber. From the Borrelia genome page, you can go to tables listing all ORF, or to taxtable, which provides an interesting nearest neighbor coloring of the genome. It is noteworthy that many of the pink dots are endonucleases. Also, there are many transporters among the odd colored genes. In an attempt to capture some phylogenetic information in blast comparisons, Olendzenski et al. pioneered an approach to use multiple reference genomes to screen for putatively horizontally transferred genes (see Fig. 4). A similar approach, but using only two instead of three reference genomes is implemented in the TAX PLOT program at the NCBI's genome page (see below). You pick one genome to analyze, and two reference genomes. The program returns a plot of every ORF in the selected genome represented in a coordinate system, where the two coordinates are the highest alignment score with the two reference genomes:
Selected genome was from Borrelia burgdorferi. The list of selected genes is below:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| More on Comparing Genomes: Genome dot plots allow to compare two genomes (or rather the ORF in encoded in these genomes). In contrast to a normal dot plot, one does not move a window through the sequence, rather one takes one ORF at a time and compares it to the other genome. Robert L. Charlebois'
genome and bioinformatics site performed
these and other analysis. Most of these are now availble at the EMU server maintained by Robert Beiko For example BLASTP-based dot plot of Pyrococcus abyssi vs Pyrococcus horikoshii depicted below clearly reveals inversions, and duplications (two parallel diagonals), the latter can also be detected by comparing a genome to itself.
See this paper from Tillier and Collins on a discussion of this and similar patterns. Recently, the NCBI added a pairwise genome comparison of protein homologs (symmetrical best hits) to their web page (from any summary sequence view of a genome (e.g. here) select GenPlot (e.g. here). (This analysis is different from the above in that in does not consider all pairwise scores, but only those ORFs that pick each other as top scoring blast hits, i.e. at best each ORF is represented by one point.) The blastall algorithm might be your best chance to generate a plot that includes all significant blast hits (possibly covered on Wednesday).
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Selection versus genetic drift. Selection
Deterministic models to describe selection: codominance (kind of logistic equation) q=frequency of allele A2, Genotype: A1A1 A1A2 A2A2 Relative number of offspring 1 1+s 1+2s
Change in frequency (approximately): Genotype: A1A1 A1A2 A2A2
s1>s2 : balancing selection (try it) Go to Kent Holsinger's collection of JAVA applets here and explore some of the time courses with different values of s1 and s2. Under which conditions of w11, w12, and w22 can one maintain both alleles over long periods of time? Stochastic approaches -- random drift - neutral evolution: Law of the gutter (see also Steven J Gould?s interpretation on the trend to increasing complexity) Explore some simulations: How does the survival of multiple alleles in a population depend on the population size. The following assumes co-dominance or no selection: s=0: Probability of fixation, P, is equal to frequency of allele in population, q mutation rate (per gene/per unit of time) = u ; frequency with which new alleles are generated in a diploid population size N equals to u*2N Probability of fixation for each new allele = 1/(2N) Therefore: For advantageous mutations: Fixation timeNeutral mutations: tav=4*Ne generations S unequal to 0: tav= (2/s) ln (2N) generations (also true for mutations with negative s -- How can this be??) E.g.: N=106, s=0: average time to fixation: 4*106 generations N=106, s=0.01: average time to fixation: 2900 generations |
|
Neutral theory: The vast majority of observed sequence differences between members of a population are neutral (or close to neutral). These differences can be fixed in the population through random genetic drift. Some mutations are strongly counter selected (this is why there are patterns of conserved residues). Only very seldom is a mutation under positive selection. The neutral theory does not say that all evolution is neutral and everything is only due to to genetic drift. (Nearly neutral theory: Even synonymous mutations do not lead to random composition but to codon bias. Small negative selection might be sufficient to produce this bias. ) Note: the larger the population the better selection works, and the closer to neutral a mutation needs to be in order to be fixed by genetic drift. (If N*s<<1 the mutation behaves as neutral, and the fixation probability is 1/N; if N*s~1 then fixation probability is only about 2s, which is small, but seems to work.) Is Evolution in humans only neutral? Does selection still play a role? E.g., |