in reply to Re^6: Compare hash with arrays and print
in thread Compare hash with arrays and print

when I split it with a newline the data-lines also split up.

Instead of splitting on all newlines (as split would do by default)

my (@title) = split(/\n/,$output);

you could tell split to split into two parts only:

my ($title, $data) = split /\n/, $output, 2;

In this case, $data would hold all the remaining lines of sequence data.

That said, I'm not really sure why you want to split the record in the first place, if you then proceed to print out the parts with the same newline added in between them :)  I.e.

... print ">".$title."\n".$data."\n";

would result in the same output as

print ">$output\n";

Replies are listed 'Best First'.
Re^8: Compare hash with arrays and print
by ad23 (Acolyte) on Jul 13, 2010 at 19:00 UTC

    Thanks once again for your input!

    When I try running my script with the following test set , it does not print anything in the output files, instead prints the data onto the screen!

    >001.b1 gnl|ti|10009 GCTAGTGCTAGCTAGCTAGCATCGATCGAT >002.b1 gnl|ti|10010 CAGTCAGTCGTAGTGCTAGCTGATGCTCGT >003.b1 gnl|ti|10011 CGATCGTAGTCGTATCGATGCTGACGTAGG >004.b1 gnl|ti|10012 CGATGCTAGTCGTAGTGCTAGTGCTATGTC >005.b1 gnl|ti|10013 CAGTCGTAGTCGATGCTGTATCATAGCGTA >006.b1 gnl|ti|10014 ACAGTGCTAGCTGATCGTAGCTGAGCGGAG >007.b1 gnl|ti|10015 AGCTAGCTGATGTCGATGCTGATCGTGATG >008.b1 gnl|ti|10016 CGATGCTGATGCTGATGCTGTAGCTATACG >008.g1 gnl|ti|10017 GCTAGCTAGTCGTAGTCGTAGTGTCGTAGG >009.b1 gnl|ti|10018 CGATGCTAGTCGTAGTCGTAGCTGATGCGT >010.b1 gnl|ti|10019 CGATCGTAGTCGTAGCTGATGCTGTAGCTG >010.g1 gnl|ti|10020 CGTAGCTGATCGTAGCGTGACTGTAGCTGG >011.b1 gnl|ti|10021 CGTAGCTGATGCTGATCGTAGCTAGTCGAT >011.g1 gnl|ti|10022 CAGCTGATCGTAGCTGATGCTGATGTGTGT

    I thought maybe splitting the header and data might help. But as it turns out, I am not going the right way.

    The output I get from the script is:

    >001.b1 gnl|ti|10009 GCTAGTGCTAGCTAGCTAGCATCGATCGAT >002.b1 gnl|ti|10010 CAGTCAGTCGTAGTGCTAGCTGATGCTCGT >003.b1 gnl|ti|10011 CGATCGTAGTCGTATCGATGCTGACGTAGG >004.b1 gnl|ti|10012 CGATGCTAGTCGTAGTGCTAGTGCTATGTC >005.b1 gnl|ti|10013 CAGTCGTAGTCGATGCTGTATCATAGCGTA >006.b1 gnl|ti|10014 ACAGTGCTAGCTGATCGTAGCTGAGCGGAG >007.b1 gnl|ti|10015 AGCTAGCTGATGTCGATGCTGATCGTGATG >008.b1 gnl|ti|10016 CGATGCTGATGCTGATGCTGTAGCTATACG >008.g1 gnl|ti|10017 GCTAGCTAGTCGTAGTCGTAGTGTCGTAGG >009.b1 gnl|ti|10018 CGATGCTAGTCGTAGTCGTAGCTGATGCGT >010.b1 gnl|ti|10019 CGATCGTAGTCGTAGCTGATGCTGTAGCTG >010.g1 gnl|ti|10020 CGTAGCTGATCGTAGCGTGACTGTAGCTGG >011.b1 gnl|ti|10021 CGTAGCTGATGCTGATCGTAGCTAGTCGAT >011.g1 gnl|ti|10022 CAGCTGATCGTAGCTGATGCTGATGTGTGT

    My script for the same is:

    ...... s/^>//mg; my ($name) = /^([^\n]+)/; if($hash{$name} == 10){ select FILE1; } if($hash{$name} == 20){ select FILE2; } if($hash{$name} == 30){ select FILE3; } } #print ">$_"; my $output = $_; print ">$output\n";

    Thanks again!

      When I try running my script with the following test set , it does not print anything in the output files, instead prints the data onto the screen!

      This is most likely because with

      my ($name) = /^([^\n]+)/;

      you're extracting the entire header line as $name to be used as the hash key, but you have not set up your hash accordingly with keys such as "001.b1 gnl|ti|10009".  In this case, none of the select FILE1 statements would execute, so STDOUT (connected to the screen) remains the default output handle for print.

      Note that ^([^\n]+) would match everything from the beginning of the record up to the first newline, because the character class [^\n] says "match any character but a newline" (the ^ within the class negates).

      Try one of the other suggestions, or adjust the regex as needed.

        Gosh! I didn't notice that at all. But now I understand!

        This seem to run perfectly fine with my test data. However my actual data files are huge.

        Thanks a tonne again almut. I really appreciate your help!

        Hi...

        Since I am comparing my keys from hash, shouldn't it print out the results in a sorted manner?

        Do I need to use an array to sort these separately?

        Thanks!!