sorting and other problems in hash

sugar has asked for the wisdom of the Perl Monks concerning the following question:

dear monks, i m here with yet another problem. Hope you would guide me as you have always done !!! I am comparing 2 files, i take information of left and right values from file 2, and extract numbers at the beginning(left value) and end(right value) of every string-numbers (based on their unique ID which starts with '>') from file 1.

file1:
>AAAT3R length=110
40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 4
+0 40 40 40 40 40 40 40 40 40 40 40 40 38 38 38 38 40 40 39 40 40 40 4
+0 40 40 40 40 40 40 38 38 37 39 36 36 40 36 35 35 35 38 40 35 35 33 3
+5 35 35 40 40 40 40 37 37 38 38 38 40 40 40 40 40 40 40 40 40 40 40 4
+0 40 40 40 40 37 36 36 31 22 22 22 20 20 20 20 20 14
>AAA2OJ length=70
18 18 18 21 35 35 35 32 32 32 33 35 38 39 37 37 39 39 39 39 39 40 40 3
+9 39 39 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 39 3
+9 37 35 35 39 37 37 37 37 37 37 37 37 37 37 33 32 32 30 20 17 17 17 0

file2:
>AAAT3R_left length=6
TACATA
>AAAT3R_right length=62
ACTACTGATTTGATTATCTTTGATCTCTGTCGAACTAACTATATCTTAGTATGATCTTTAAT
>AAA2OJ_left length=14
TTTTGGACTATCTG
>AAA2OJ_right length=14
AGGCTGTTCTTTTN

result file:(expected)
>AAAT3R_left length=6
40 40 40 40 40 40
>AAAT3R_right length=62
40 40 40 38 38 37 39 36 36 40 36 35 35 35 38 40 35 35 33 35 35 35 40 4
+0 40 40 37 37 38 38 38 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 4
+0 37 36 36 31 22 22 22 20 20 20 20 20 14
>AAA2OJ_left length=14
18 18 18 21 35 35 35 32 32 32 33 35 38 39
>AAA2OJ_right length=14
37 37 37 37 37 33 32 32 30 20 17 17 17 0
[download]

This is the code i have written so far, to get the desired output.

#!/usr/bin/perl
use strict;
#processing file2
open(FH1,"file2.txt") or die "cant open";
my @ar=<FH1>;
my $left;my $right;my @final;
foreach my $aray(@ar){
      if($aray=~m/left/){
            if($aray=~m/(^>.*)_\w+\s\w+(=)(\d+)/){
                  $left=$3;
            }
      }
      else{
            my $right_header=$aray;
            if($aray=~m/(^>.*)_\w+\s\w+(=)(\d+)/){
              $right=$3;
              push(@final,$1);
              push(@final,"left=$left".".."."right=$right");
            }
      }
}
my %hash=@final;
my $val;my $head;
my ($leftnew,$rightnew);my ($side1,$num1);my ($side2,$num2);
my @numright;my @numleft;my @string;my @string1;
#comparing with file 1
open(FH2,"file1.txt") or die "cant open";
my @ar2=<FH2>;
my $aray2;
while (my ($key,$value)=each %hash){
foreach $aray2(@ar2){
      if ($aray2=~m/^(>\w+)/){
            $head=$1;
      }
      $val=$hash{$key};
      ($leftnew,$rightnew)=split(/\../,$val);
      ($side1,$num1)=split(/=/,$leftnew);
      ($side2,$num2)=split(/=/,$rightnew);
      @numleft=split(' ',$aray2);
      @numright=reverse(@numleft);
            if ($head eq $key){
                  if($aray2=~m/^\d+/){
                        my $i;my $j;
                        for($i=0;$i<scalar(@numleft);$i++){
                              if($i==$num1){
                                    last;
                              }else{
                              push(@string1,$numleft[$i]);
                              }
                        }
                       for ($j=0;$j<scalar(@numright);$j++){
                        if($j==$num2){
                                    last;
                              }
                              else{
                            push(@string,$numright[$j]." ");
                              }
                        }
                  }
            }
           
           
      }
      print "\n",$head.'_'.'left '.'length='.$num1,"\n";   
      print "@string1\n";
      print "\n",$head.'_'.'right '.'length='.$num2,"\n";  
      print reverse(@string),"\n";
      $key=();$value=();$aray2=();@string=();@string1=();
}
[download]

My problem here is, i am not able to print the unique header ID for every line. I know that the $string is a scalar variable and outside the loop, it will hold the last value. But dont know how to solve the problem. moreover, Want the results sorted according to the input file order. (as given in the expected result file). Please help me solve it. Thank u :)

Comment on sorting and other problems in hash Select or Download Code

Replies are listed 'Best First'.
Re: sorting and other problems in hash by CountZero (Bishop) on Dec 31, 2008 at 07:05 UTC
Did nobody notice that you can easily return the last X elements from an array by using a negative index? Get the last 10 elements of the array: `@array = qw /1 2 3 4 5 6 7 8 9 10 11 12 13 14 15/; print join ' ', @array[-10 .. -1];` [download] No need to get the length of the array with $#array, or `reverse` the array, ... CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James	[reply] [d/l] [select]
Re^2: sorting and other problems in hash by oko1 (Deacon) on Dec 31, 2008 at 15:34 UTC
That was my original approach in returning the 'right' side. I found the approach I ended up taking a bit more readable and intuitive. -- "Language shapes the way we think, and determines what we can think about." -- B. L. Whorf	[reply]
Re: sorting and other problems in hash by toolic (Bishop) on Dec 31, 2008 at 02:49 UTC
Stuff file2 contents into a Perl data structure, such as a hash of hashes, then print desired file1 contents using array slices: use strict; use warnings; my $fh; open $fh, '<', 'file2.txt' or die "cant open file2.txt: $!"; my %info; while (<$fh>) { if (/^>(.)_left\s\w+=(\d+)/) { $info{$1}{left} = $2; } if (/^>(.)_right\s\w+=(\d+)/) { $info{$1}{right} = $2; } } close $fh; # comparing with file 1 open $fh, '<', 'file1.txt' or die "cant open file1.txt: $!"; my $id; while (<$fh>) { if (/^>(\w+)/) { $id = $1; } else { my @nums = split; my $len = $info{$id}{left}; print ">${id}_left length=$len\n"; print "@nums[0..($len-1)]\n"; $len = $info{$id}{right}; print ">${id}_right length=$len\n"; print "@nums[($#nums-$len)..$#nums]\n"; } } __END__ >AAAT3R_left length=6 40 40 40 40 40 40 >AAAT3R_right length=62 40 40 40 40 40 40 38 38 37 39 36 36 40 36 35 35 35 38 40 35 35 33 35 3 +5 35 40 40 40 40 37 37 38 38 38 40 40 40 40 40 40 40 40 40 40 40 40 4 +0 40 40 40 37 36 36 31 22 22 22 20 20 20 20 20 14 >AAA2OJ_left length=14 18 18 18 21 35 35 35 32 32 32 33 35 38 39 >AAA2OJ_right length=14 37 37 37 37 37 37 33 32 32 30 20 17 17 17 0 [download]	[reply] [d/l]
Re: sorting and other problems in hash by atcroft (Abbot) on Dec 31, 2008 at 03:09 UTC
That was a lot of code to try to get what you wanted. I used a (to me) simpler approach: Read in file1 Looping through the file If a blank line or only contains whitespace, skip If the line begins with a '>', split on whitespace to get the sequence name Otherwise, split the line on whitespace into an array in a hash, keyed by the sequence name Read in file2 Looping through the file If a blank line or only contains whitespace, skip If the line begins with a '>' Split the line on understore, equal, or whitespace to get the sequence name, "side", and length If the sequence is not defined, display a warning Generate the range based on the "side" Print the line Print the digit sequence range Otherwise, display the line Or, in code: Read more... (2 kB) Hope that helps.	[reply] [d/l]
Re: sorting and other problems in hash by oko1 (Deacon) on Dec 31, 2008 at 02:52 UTC
If I understand the problem correctly, then you might try approaching it like this: #!/usr/bin/perl -w use strict; our ($File1, $File2) = qw/file1 file2/; open File1 or die "$File1: $!\n"; open File2 or die "$File2: $!\n"; my ($key, %results); while (<File1>){ next if /^\s$/; chomp; if (/^>\s(\S+)/){ $key = $1; } else { $results{$key} = [ split ]; } } close File1; my ($len, $side, $str); while (<File2>){ next if /^\s$/; if (/^>([^_]+)_(left\|right).?(\d+)\s*$/){ print; $str = $1; $side = $2; $len = $3; } else { my @list; @list = @{$results{$str}}; if ($side eq 'left'){ die "$str is too short for a left slice of $len!\n" unless @list >= $len; print "@list[0..$len-1]\n"; } else { die "$str is too short for a right slice of $len!\n" unless @list >= $len; print "@list[@list-$len..$#list]\n"; } } } close File2; [download] -- "Language shapes the way we think, and determines what we can think about." -- B. L. Whorf	[reply] [d/l]
Re: sorting and other problems in hash by sanku (Beadle) on Dec 31, 2008 at 03:35 UTC
hi friend, Try out this one :) open(FILE1,"file1.txt") or die $!;@file1=<FILE1>;close(FILE1); open(FILE2,"file2.txt") or die $!;@file2=<FILE2>;close(FILE2); foreach $file2(@file2){ if($file2=~/(\w+)(_)(\w+)\s(\w+)(=)(\d+)/) { $id=$1; if($3 eq 'right'){$value=$6."r#";} if($3 eq 'left'){$value=$6."l";} push(@array,"$value"); push(@id,"$id"); } } push(@newid,grep {!$ss{$_}++} @id); @arraysplit=@array2=$ss=(); $ss=join('',@array); @arraysplit=split(/\s+\|\n/,$ss); @array2=split('#',$arraysplit[0]); foreach $f1(0 .. $file1[$#fiel1]) { if($file1[$f1] =~/>/){ $firstline=$file1[$f1]; chomp($firstli +ne); } if($file1[$f1] =~/^>/) { $secondline=$file1[$f1+1]; $secondlinej=join('',split(/\n/,$secondline)); foreach $vv(0 .. scalar @array2) { if($newid[$vv] eq substr($firstline,1,6)) { ($left,$right)=split(/r\|l/,$array2[$vv +]); print "\n".substr($firstline,0,7)."_le +ft"." length=$left \n"; @secondlinejj=split('\s',$secondlinej) +; for ($l=0;$l < $left;$l++){print "$sec +ondlinejj[$l] ";} print "\n"; print "\n".substr($firstline,0,7)."_ri +ght"." length=$right \n"; $len=scalar @secondlinejj."\n"; $right=$len-$right; for ($m=$right;$m <= scalar @secondlin +ejj;$m++){print "$secondlinejj[$m] ";}print "\n" } } } } [download] Output will be like this `>AAAT3R_left length=6 40 40 40 40 40 40 >AAAT3R_right length=62 40 40 40 40 40 38 38 37 39 36 36 40 36 35 35 35 38 40 35 35 33 35 35 3 +5 40 40 40 40 37 37 38 38 38 40 40 40 40 40 40 40 40 40 40 40 40 40 4 +0 40 40 37 36 36 31 22 22 22 20 20 20 20 20 14 >AAA2OJ_left length=14 18 18 18 21 35 35 35 32 32 32 33 35 38 39 >AAA2OJ_right length=14 37 37 37 37 3737 33 32 32 30 20 17 17 17 0` [download]	[reply] [d/l] [select]