Future Monday and Wednesday classes will take place in CB 206!
Assignments for Friday's class:
Assignments for Monday's class:
Annotation problem, e.g., missing ORFs in E.coli see http://mic.sgmjournals.org/content/156/7/1909.full
Types of Error in a Databank search
False positives: The number of false positives are estimated in the E-value. The P-value or significance value gives the probability that a positive identification is made in error (same as with drug tests).
False negatives: Homologous sequences in the databank that are not recognized as such. If there are only 12000 different protein families, an average a sequence should have (size of the databank)/12000 matches. In other words, the number of false negatives is probably very large.