Re: parsing BLAST output

Well, let me just rewrite your program a bit ... but be cautious, this is untested code so I couldn't check your regexes, ... (and I probably introduced some small bugs of my own :)

#! /usr/local/bin/perl -w
use strict;

my $filename = 'strept_blastx.output';

my ($start_annotation, $end_annotation, $alignments) = parse_blast ($f
+ilename);

print $start_annotation;
print map "$_\nxxxxxxxxx\n$alignments->{$_}\nxxxxxxxxxx\n", keys %$ali
+gnments;
print $end_annotation;

sub parse_blast
{
  my ($filename) = @_;

  my $blast_output_file;
  my ($start_anno, $alignment_sec, $end_anno);

  open( my $data_file, '<', $filename ) or die "Couldn't open file $fi
+lename: $!";
  $blast_output_file = do {local $/; <$data_file> };
  close $data_file;

  ($start_anno, $alignment_sec, $end_anno) =
    ($blast_output_file =~/(.*^ALIGNMENTS\n)(.*)(^ Database:.*)/ms);

  my $align_hashref = parse_blast_alignment($alignment_sec);
  
  return ($start_anno, $end_anno, $align_hashref);
}

sub parse_blast_alignment
{
  my ($alignment_section) = @_;
  my $alignment_hashref;

  while ($alignment_section =~ /^>.*\n(^(?!>).*\n)+/gm)
  {
    my $value = $&;
    my ($key) = (split(/\|/, $value)) [1];
    $alignment_hashref->{$key} = $value;
  }

  return $alignment_hashref;
}
[download]

So, this should run and then you can tell us if you still have some problems with your regexes and the actual extraction. Some more explanations for us non-biologists about the format would then be helpful as well.

-- Hofmator

Comment on Re: parsing BLAST output Download Code