in reply to Re^7: Compare hash with arrays and print
in thread Compare hash with arrays and print

Thanks once again for your input!

When I try running my script with the following test set , it does not print anything in the output files, instead prints the data onto the screen!

>001.b1 gnl|ti|10009 GCTAGTGCTAGCTAGCTAGCATCGATCGAT >002.b1 gnl|ti|10010 CAGTCAGTCGTAGTGCTAGCTGATGCTCGT >003.b1 gnl|ti|10011 CGATCGTAGTCGTATCGATGCTGACGTAGG >004.b1 gnl|ti|10012 CGATGCTAGTCGTAGTGCTAGTGCTATGTC >005.b1 gnl|ti|10013 CAGTCGTAGTCGATGCTGTATCATAGCGTA >006.b1 gnl|ti|10014 ACAGTGCTAGCTGATCGTAGCTGAGCGGAG >007.b1 gnl|ti|10015 AGCTAGCTGATGTCGATGCTGATCGTGATG >008.b1 gnl|ti|10016 CGATGCTGATGCTGATGCTGTAGCTATACG >008.g1 gnl|ti|10017 GCTAGCTAGTCGTAGTCGTAGTGTCGTAGG >009.b1 gnl|ti|10018 CGATGCTAGTCGTAGTCGTAGCTGATGCGT >010.b1 gnl|ti|10019 CGATCGTAGTCGTAGCTGATGCTGTAGCTG >010.g1 gnl|ti|10020 CGTAGCTGATCGTAGCGTGACTGTAGCTGG >011.b1 gnl|ti|10021 CGTAGCTGATGCTGATCGTAGCTAGTCGAT >011.g1 gnl|ti|10022 CAGCTGATCGTAGCTGATGCTGATGTGTGT

I thought maybe splitting the header and data might help. But as it turns out, I am not going the right way.

The output I get from the script is:

>001.b1 gnl|ti|10009 GCTAGTGCTAGCTAGCTAGCATCGATCGAT >002.b1 gnl|ti|10010 CAGTCAGTCGTAGTGCTAGCTGATGCTCGT >003.b1 gnl|ti|10011 CGATCGTAGTCGTATCGATGCTGACGTAGG >004.b1 gnl|ti|10012 CGATGCTAGTCGTAGTGCTAGTGCTATGTC >005.b1 gnl|ti|10013 CAGTCGTAGTCGATGCTGTATCATAGCGTA >006.b1 gnl|ti|10014 ACAGTGCTAGCTGATCGTAGCTGAGCGGAG >007.b1 gnl|ti|10015 AGCTAGCTGATGTCGATGCTGATCGTGATG >008.b1 gnl|ti|10016 CGATGCTGATGCTGATGCTGTAGCTATACG >008.g1 gnl|ti|10017 GCTAGCTAGTCGTAGTCGTAGTGTCGTAGG >009.b1 gnl|ti|10018 CGATGCTAGTCGTAGTCGTAGCTGATGCGT >010.b1 gnl|ti|10019 CGATCGTAGTCGTAGCTGATGCTGTAGCTG >010.g1 gnl|ti|10020 CGTAGCTGATCGTAGCGTGACTGTAGCTGG >011.b1 gnl|ti|10021 CGTAGCTGATGCTGATCGTAGCTAGTCGAT >011.g1 gnl|ti|10022 CAGCTGATCGTAGCTGATGCTGATGTGTGT

My script for the same is:

...... s/^>//mg; my ($name) = /^([^\n]+)/; if($hash{$name} == 10){ select FILE1; } if($hash{$name} == 20){ select FILE2; } if($hash{$name} == 30){ select FILE3; } } #print ">$_"; my $output = $_; print ">$output\n";

Thanks again!

Replies are listed 'Best First'.
Re^9: Compare hash with arrays and print
by almut (Canon) on Jul 13, 2010 at 19:25 UTC
    When I try running my script with the following test set , it does not print anything in the output files, instead prints the data onto the screen!

    This is most likely because with

    my ($name) = /^([^\n]+)/;

    you're extracting the entire header line as $name to be used as the hash key, but you have not set up your hash accordingly with keys such as "001.b1 gnl|ti|10009".  In this case, none of the select FILE1 statements would execute, so STDOUT (connected to the screen) remains the default output handle for print.

    Note that ^([^\n]+) would match everything from the beginning of the record up to the first newline, because the character class [^\n] says "match any character but a newline" (the ^ within the class negates).

    Try one of the other suggestions, or adjust the regex as needed.

      Gosh! I didn't notice that at all. But now I understand!

      This seem to run perfectly fine with my test data. However my actual data files are huge.

      Thanks a tonne again almut. I really appreciate your help!

      Hi...

      Since I am comparing my keys from hash, shouldn't it print out the results in a sorted manner?

      Do I need to use an array to sort these separately?

      Thanks!!