sugar has asked for the wisdom of the Perl Monks concerning the following question:

dear monks, i m here with yet another problem. Hope you would guide me as you have always done !!! I am comparing 2 files, i take information of left and right values from file 2, and extract numbers at the beginning(left value) and end(right value) of every string-numbers (based on their unique ID which starts with '>') from file 1.
file1: >AAAT3R length=110 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 4 +0 40 40 40 40 40 40 40 40 40 40 40 40 38 38 38 38 40 40 39 40 40 40 4 +0 40 40 40 40 40 40 38 38 37 39 36 36 40 36 35 35 35 38 40 35 35 33 3 +5 35 35 40 40 40 40 37 37 38 38 38 40 40 40 40 40 40 40 40 40 40 40 4 +0 40 40 40 40 37 36 36 31 22 22 22 20 20 20 20 20 14 >AAA2OJ length=70 18 18 18 21 35 35 35 32 32 32 33 35 38 39 37 37 39 39 39 39 39 40 40 3 +9 39 39 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 39 3 +9 37 35 35 39 37 37 37 37 37 37 37 37 37 37 33 32 32 30 20 17 17 17 0 file2: >AAAT3R_left length=6 TACATA >AAAT3R_right length=62 ACTACTGATTTGATTATCTTTGATCTCTGTCGAACTAACTATATCTTAGTATGATCTTTAAT >AAA2OJ_left length=14 TTTTGGACTATCTG >AAA2OJ_right length=14 AGGCTGTTCTTTTN result file:(expected) >AAAT3R_left length=6 40 40 40 40 40 40 >AAAT3R_right length=62 40 40 40 38 38 37 39 36 36 40 36 35 35 35 38 40 35 35 33 35 35 35 40 4 +0 40 40 37 37 38 38 38 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 4 +0 37 36 36 31 22 22 22 20 20 20 20 20 14 >AAA2OJ_left length=14 18 18 18 21 35 35 35 32 32 32 33 35 38 39 >AAA2OJ_right length=14 37 37 37 37 37 33 32 32 30 20 17 17 17 0
This is the code i have written so far, to get the desired output.
#!/usr/bin/perl use strict; #processing file2 open(FH1,"file2.txt") or die "cant open"; my @ar=<FH1>; my $left;my $right;my @final; foreach my $aray(@ar){ if($aray=~m/left/){ if($aray=~m/(^>.*)_\w+\s\w+(=)(\d+)/){ $left=$3; } } else{ my $right_header=$aray; if($aray=~m/(^>.*)_\w+\s\w+(=)(\d+)/){ $right=$3; push(@final,$1); push(@final,"left=$left".".."."right=$right"); } } } my %hash=@final; my $val;my $head; my ($leftnew,$rightnew);my ($side1,$num1);my ($side2,$num2); my @numright;my @numleft;my @string;my @string1; #comparing with file 1 open(FH2,"file1.txt") or die "cant open"; my @ar2=<FH2>; my $aray2; while (my ($key,$value)=each %hash){ foreach $aray2(@ar2){ if ($aray2=~m/^(>\w+)/){ $head=$1; } $val=$hash{$key}; ($leftnew,$rightnew)=split(/\../,$val); ($side1,$num1)=split(/=/,$leftnew); ($side2,$num2)=split(/=/,$rightnew); @numleft=split(' ',$aray2); @numright=reverse(@numleft); if ($head eq $key){ if($aray2=~m/^\d+/){ my $i;my $j; for($i=0;$i<scalar(@numleft);$i++){ if($i==$num1){ last; }else{ push(@string1,$numleft[$i]); } } for ($j=0;$j<scalar(@numright);$j++){ if($j==$num2){ last; } else{ push(@string,$numright[$j]." "); } } } } } print "\n",$head.'_'.'left '.'length='.$num1,"\n"; print "@string1\n"; print "\n",$head.'_'.'right '.'length='.$num2,"\n"; print reverse(@string),"\n"; $key=();$value=();$aray2=();@string=();@string1=(); }
My problem here is, i am not able to print the unique header ID for every line. I know that the $string is a scalar variable and outside the loop, it will hold the last value. But dont know how to solve the problem. moreover, Want the results sorted according to the input file order. (as given in the expected result file). Please help me solve it. Thank u :)

Replies are listed 'Best First'.
Re: sorting and other problems in hash
by CountZero (Bishop) on Dec 31, 2008 at 07:05 UTC
    Did nobody notice that you can easily return the last X elements from an array by using a negative index?

    Get the last 10 elements of the array:

    @array = qw /1 2 3 4 5 6 7 8 9 10 11 12 13 14 15/; print join ' ', @array[-10 .. -1];
    No need to get the length of the array with $#array, or reverse the array, ...

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      That was my original approach in returning the 'right' side. I found the approach I ended up taking a bit more readable and intuitive.


      --
      "Language shapes the way we think, and determines what we can think about."
      -- B. L. Whorf
Re: sorting and other problems in hash
by toolic (Bishop) on Dec 31, 2008 at 02:49 UTC
    Stuff file2 contents into a Perl data structure, such as a hash of hashes, then print desired file1 contents using array slices:
    use strict; use warnings; my $fh; open $fh, '<', 'file2.txt' or die "cant open file2.txt: $!"; my %info; while (<$fh>) { if (/^>(.*)_left\s\w+=(\d+)/) { $info{$1}{left} = $2; } if (/^>(.*)_right\s\w+=(\d+)/) { $info{$1}{right} = $2; } } close $fh; # comparing with file 1 open $fh, '<', 'file1.txt' or die "cant open file1.txt: $!"; my $id; while (<$fh>) { if (/^>(\w+)/) { $id = $1; } else { my @nums = split; my $len = $info{$id}{left}; print ">${id}_left length=$len\n"; print "@nums[0..($len-1)]\n"; $len = $info{$id}{right}; print ">${id}_right length=$len\n"; print "@nums[($#nums-$len)..$#nums]\n"; } } __END__ >AAAT3R_left length=6 40 40 40 40 40 40 >AAAT3R_right length=62 40 40 40 40 40 40 38 38 37 39 36 36 40 36 35 35 35 38 40 35 35 33 35 3 +5 35 40 40 40 40 37 37 38 38 38 40 40 40 40 40 40 40 40 40 40 40 40 4 +0 40 40 40 37 36 36 31 22 22 22 20 20 20 20 20 14 >AAA2OJ_left length=14 18 18 18 21 35 35 35 32 32 32 33 35 38 39 >AAA2OJ_right length=14 37 37 37 37 37 37 33 32 32 30 20 17 17 17 0
Re: sorting and other problems in hash
by atcroft (Abbot) on Dec 31, 2008 at 03:09 UTC

    That was a lot of code to try to get what you wanted. I used a (to me) simpler approach:

    1. Read in file1
      1. Looping through the file
        1. If a blank line or only contains whitespace, skip
        2. If the line begins with a '>', split on whitespace to get the sequence name
        3. Otherwise, split the line on whitespace into an array in a hash, keyed by the sequence name
    2. Read in file2
      1. Looping through the file
        1. If a blank line or only contains whitespace, skip
          1. If the line begins with a '>'
          2. Split the line on understore, equal, or whitespace to get the sequence name, "side", and length
          3. If the sequence is not defined, display a warning
          4. Generate the range based on the "side"
          5. Print the line
          6. Print the digit sequence range
        2. Otherwise, display the line

    Or, in code:

    Hope that helps.

Re: sorting and other problems in hash
by oko1 (Deacon) on Dec 31, 2008 at 02:52 UTC

    If I understand the problem correctly, then you might try approaching it like this:

    #!/usr/bin/perl -w use strict; our ($File1, $File2) = qw/file1 file2/; open File1 or die "$File1: $!\n"; open File2 or die "$File2: $!\n"; my ($key, %results); while (<File1>){ next if /^\s*$/; chomp; if (/^>\s*(\S+)/){ $key = $1; } else { $results{$key} = [ split ]; } } close File1; my ($len, $side, $str); while (<File2>){ next if /^\s*$/; if (/^>([^_]+)_(left|right).*?(\d+)\s*$/){ print; $str = $1; $side = $2; $len = $3; } else { my @list; @list = @{$results{$str}}; if ($side eq 'left'){ die "$str is too short for a left slice of $len!\n" unless @list >= $len; print "@list[0..$len-1]\n"; } else { die "$str is too short for a right slice of $len!\n" unless @list >= $len; print "@list[@list-$len..$#list]\n"; } } } close File2;

    --
    "Language shapes the way we think, and determines what we can think about."
    -- B. L. Whorf
Re: sorting and other problems in hash
by sanku (Beadle) on Dec 31, 2008 at 03:35 UTC
    hi friend, Try out this one :)
    open(FILE1,"file1.txt") or die $!;@file1=<FILE1>;close(FILE1); open(FILE2,"file2.txt") or die $!;@file2=<FILE2>;close(FILE2); foreach $file2(@file2){ if($file2=~/(\w+)(_)(\w+)\s(\w+)(=)(\d+)/) { $id=$1; if($3 eq 'right'){$value=$6."r#";} if($3 eq 'left'){$value=$6."l";} push(@array,"$value"); push(@id,"$id"); } } push(@newid,grep {!$ss{$_}++} @id); @arraysplit=@array2=$ss=(); $ss=join('',@array); @arraysplit=split(/\s+|\n/,$ss); @array2=split('#',$arraysplit[0]); foreach $f1(0 .. $file1[$#fiel1]) { if($file1[$f1] =~/>/){ $firstline=$file1[$f1]; chomp($firstli +ne); } if($file1[$f1] =~/^>/) { $secondline=$file1[$f1+1]; $secondlinej=join('',split(/\n/,$secondline)); foreach $vv(0 .. scalar @array2) { if($newid[$vv] eq substr($firstline,1,6)) { ($left,$right)=split(/r|l/,$array2[$vv +]); print "\n".substr($firstline,0,7)."_le +ft"." length=$left \n"; @secondlinejj=split('\s',$secondlinej) +; for ($l=0;$l < $left;$l++){print "$sec +ondlinejj[$l] ";} print "\n"; print "\n".substr($firstline,0,7)."_ri +ght"." length=$right \n"; $len=scalar @secondlinejj."\n"; $right=$len-$right; for ($m=$right;$m <= scalar @secondlin +ejj;$m++){print "$secondlinejj[$m] ";}print "\n" } } } }
    Output will be like this
    >AAAT3R_left length=6 40 40 40 40 40 40 >AAAT3R_right length=62 40 40 40 40 40 38 38 37 39 36 36 40 36 35 35 35 38 40 35 35 33 35 35 3 +5 40 40 40 40 37 37 38 38 38 40 40 40 40 40 40 40 40 40 40 40 40 40 4 +0 40 40 37 36 36 31 22 22 22 20 20 20 20 20 14 >AAA2OJ_left length=14 18 18 18 21 35 35 35 32 32 32 33 35 38 39 >AAA2OJ_right length=14 37 37 37 37 3737 33 32 32 30 20 17 17 17 0