PSI BLAST
#decided to have input file entered in command line ################################################## while (defined ($line=<IN>)){ # read through file line by line if ($line=~/^>/) { #look for beginning of line starting with > (^ is an anchor for the beginning of the line) close(IN);
PSI-blast provides an enormous advantage over normal blast in the detection of distantly related sequences. It only works, if some closely related sequences are already available, but if this is the case it finds a lot of other distantly related sequences. The NCBI page describes PSI blast as follows: The results of a normal blast search are aligned and a pattern of conserved residues is extracted from the alignment. This pattern (the Position Specific Scoring Matrix) is used as query for the next iteration. An important parameter to adjust is the E-value threshold up to which matches are included in the alignment and pattern extraction.
The "problem" is that the E-value reported in a PSI-blast search represents the match with the profile, not with the original sequence!! PSI BLAST Example
Query sequence: >gi|163506|gb|AAA30693.1| peripherin
|
Assignments for Friday (10/27)
Selection versus genetic drift. Selection
Deterministic models to describe selection: codominance (kind of logistic equation) q=frequency of allele A2, Genotype: A1A1 A1A2 A2A2 Relative number of offspring 1 1+s 1+2s
Change in frequency (approximately): Genotype: A1A1 A1A2 A2A2
s1>s2 : balancing selection (try it) Go to Kent Holsinger's collection of JAVA applets here and explore some of the time courses with different values of s1 and s2. Under which conditions of w11, w12, and w22 can one maintain both alleles over long periods of time? Stochastic approaches -- random drift - neutral evolution: Law of the gutter (see also Steven J Gould?s interpretation on the trend to increasing complexity) Explore some simulations: How does the survival of multiple alleles in a population depend on the population size. The following assumes co-dominance or no selection: s=0: Probability of fixation, P, is equal to frequency of allele in population, q mutation rate (per gene/per unit of time) = u ; frequency with which new alleles are generated in a diploid population size N equals to u*2N Probability of fixation for each new allele = 1/(2N) Therefore: For advantageous mutations: Fixation timeNeutral mutations: tav=4*Ne generations S unequal to 0: tav= (2/s) ln (2N) generations (also true for mutations with negative s -- How can this be??) E.g.: N=106, s=0: average time to fixation: 4*106 generations N=106, s=0.01: average time to fixation: 2900 generations |