Parsing Blast with Bioperl

Bioperl is

    • A Set of Perl modules for manipulating genomic and other biological data

    • An Open Source Toolkit with many contributors

    • A flexible and extensible system for doing bioinformatics data manipulation

 

Bioperl documentation

 

1. Bioperl tutorial:
        http://www.bioperl.org/wiki/Bptutorial.pl
(you also can run this from the command line perl -w Bptutorial.pl, or read it via perldoc bio::searchIO )

2. Mastering Perl for Bioinformatics: Introduction to bioperl (recommended)
        http://www.oreilly.com/catalog/mperlbio/chapter/ch09.pdf

 

3. Bioperl documentation

       http://www.bioperl.org/Core/Latest/modules.html

 

 4. Modules listing.

       http://doc.bioperl.org/releases/bioperl-1.4/

 

5. Unix help: man and perldoc.
   (e.g. perldoc bio::searchIO)

 

 

Object-oriented programming

 

Abstract Data Types

 

 

 

Implementation of Abstract Data Types

 

 

class BioinformaticClass {
  attributes:
    @students=('Anan', 'John', 'Maria', 'Jack');

    $teacher='Peter';
    %grades=('Ana'=>'good',

             'John'=>'moderate',   

             'Maria' =>'excellent'

             'Jack' => 'satisfactory');

   
  methods:
    setGrade(student);
    getGrade(student);

    addStudent;

    removeStudent;

 

  }
 

An object is an instance of a class.

 

Parsing with Bio::SearchIO

Here's example code which processes a BLAST report finding all the hits where the HSPs are greater than 100 residues and the percent identity is more than 75 percent. This code demonstrates that a result, in this case from a BLAST report, contains one or more hits, and a hit contains one or more HSPs.

Example 1
use Bio::SearchIO;

    # create a new object   
    my $in = new Bio::SearchIO(-format => 'blast', 
                               -file   => 'report.blast');
    # iterate through results
    while( my $result = $in->next_result ) {
      while( my $hit = $result->next_hit ) {
        while( my $hsp = $hit->next_hsp ) {
          if( $hsp->length('total') > 100 ) {
            if ( $hsp->percent_identity >= 75 ) {
             print "Hit= ",       $hit->name, 
                   ",Length=",     $hsp->length('total'), 
                   ",Percent_id=", $hsp->percent_identity, "\n";
            }
          }
        }  
      }
 
query= gi|20521485|dbj|AP004641.2 Oryza sativa (japonica
cultivar-group) genomic DNA, chromosome 1, BAC clone:B1147B04, 3785
bases, 977CE9AF checksum.
         (3059 letters)

Database: test.fa
           5 sequences; 1291 total letters



                                                                Score    E
Sequences producing significant alignments:                     (bits) Value

gb|443893|124775 LaForas sequence                                 92   2e-022

>gb|443893|124775 LaForas sequence
          Length = 331

 Score = 92.0 bits (227), Expect = 2e-022
 Identities = 46/52 (88%), Positives = 48/52 (91%)
 Frame = +1

Query: 2896 DMGRCSSGCNRYPEPMTPDTMIKLYREKEGLGAYIWMPTPDMSTEGRVQMLP 3051
            D+ + SSGCNRYPEPMTPDTMIKLYRE EGL AYIWMPTPDMSTEGRVQMLP
Sbjct: 197  DIVQNSSGCNRYPEPMTPDTMIKLYRE-EGL-AYIWMPTPDMSTEGRVQMLP 246


  Database: test.fa
    Posted date:  Feb 12, 2003  9:51 AM
  Number of letters in database: 1291
  Number of sequences in database:  5
 
Lambda     K      H
   0.318    0.135    0.401

Gapped
Lambda     K      H
   0.267   0.0410    0.140


Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Hits to DB: 7140
Number of Sequences: 5
Number of extensions: 180
Number of successful extensions: 2
Number of sequences better than 10.0: 2
Number of HSP's better than 10.0 without gapping: 1
Number of HSP's successfully gapped in prelim test: 0
Number of HSP's that attempted gapping in prelim test: 0
Number of HSP's gapped (non-prelim): 1
length of database: 1291
effective HSP length: 46
effective length of database: 1061
effective search space used:  1032353
frameshift window, decay const: 50,  0.1
T: 12
A: 40
X1: 16 ( 7.3 bits)
X2: 38 (14.6 bits)
X3: 64 (24.7 bits)
S1: 32 (17.6 bits)
 
 
See Howto_s at http://bioperl.open-bio.org/wiki/HOWTO:SearchIO

 

Example 2

 

use Bio::SearchIO;

 

my $cutoff = ’0.001’;

my $file = ‘blast.out’,

my $in = new Bio::SearchIO(-format => ‘blast’,

                           -file => $file);

while( my $r = $in->next_result ) {

 

    print "Query is: ", $r->query_name, " ",

    $r->query_description," ",$r->query_length," aa\n";

    print " Matrix was ", $r->get_parameter('matrix'), "\n";

 

       while( my $h = $r->next_hit ) {

        last if $h->significance > $cutoff;

        print "Hit is ", $h->name, "\n";

 

           while( my $hsp = $h->next_hsp ) {

           print " HSP Len is ", $hsp->length(’total’), " ",

                 " E-value is ", $hsp->evalue, " Bit score ",

                 $hsp->score, " \n",

                 " Query loc: ",$hsp->query->start, " ",

                 $hsp->query->end," ",

                 " Sbject loc: ",$hsp->hit->start, " ",

                 $hsp->hit->end,"\n";

            }

        }

}