Re^3: Compare hash with arrays and print

Replies are listed 'Best First'.
Re^4: Compare hash with arrays and print by ad23 (Acolyte) on Jul 13, 2010 at 15:06 UTC
I added some print statements, and now I am able to get the required contents in the output files. Thanks a lot! Also, I have some mixed fasta header data. For eg: >aw1.a1 bhi\|tn\|56564 pairs:40098 ATGCTAGATGCTAGCTAGCTAGCACTGAT CGATGCTAGCGTAGTCAGCTGATGCTGTA CGATGCTAGTCGTACG >aw1.b1 bhi\|tn\|56565 pairs:40099 CGAGCTAGTCGTAGTCGTGATGCTGATTA CGATGCTAGTCGTAGCTAGCTGATGCTGC CGATGCTAGTCGTAGTC >dd3.a1 bhi\|tn\|56566 pairs:40100 CGTAGTCGTAGTCGTAGTCGATGCTGATG GCTAGTCGATGCTAGCTAGTCGATGCTGG CGATGCTGAT >dd3.b1 bhi\|tn\|56567 pairs:40101 CGTAGTCGTAGTCGTACGTAGTCGTGAGT CGATTATTTAGGAGGGACAAGGATAGTA >hg5.a1 bhi\|tn\|56568 pairs:40102 CGTAGTCGTAGTCTAGTCGTGATGCTAGA >dfd6.a1 bhi\|tn\|56569 pairs:40103 CGATGCTACGTACGTAGTCAGTCGTGATG AATTAGAGCAGATAGAGGGGGAAAGGGTT AAACCCC >ght5.a1 bhi\|tn\|56564 ATGCTAGTCGTAGTCGATGTCGTAGCTGT CGTAGCTGATGCATGCTAGTCCGTAGCTG >tgt6.X bhi\|tn\|56564 pairs:56478 CGTAGCTGATGCTGATGCTGATGCTGTGT CGTAGCTGATGCTGATGCTTAGCTGATGC CGTAGCTGATCGTAGCTATCGTAGCTAGG >tgt6.Y bhi\|tn\|56564 pairs:56479 CGTAGTCGTAGTCGTAGTCGATGCTAGTG CGATGCTGATCGTGATGCTATGCTAGCGT CAGTCGTAGTCGTACGTAGTCGTGTGTGG [download] I want to write the complete header line (starting with '>') in my output files. Can I use the split function to divide the header line into an array and then try printing it? Thanks!	[reply] [d/l]
Re^5: Compare hash with arrays and print by almut (Canon) on Jul 13, 2010 at 16:41 UTC
I want to write the complete header line (starting with '>') in my output files. Can I use the split function ...? Sure you can use the `split` function, but if you just want to print the header line as is (i.e. copy it from the input), you wouldn't need to split up the record. As you have it, the header would be printed already without further ado. Think of it this way: `$_` holds an entire record, with the leading `'>'` removed (to more easily handle the edge cases that result from the way the input is being split by `$/`). For example, for the first record, `$_` would hold the string (including the newlines) `"aw1.a1 bhi\|tn\|56564 pairs:40098 ATGCTAGATGCTAGCTAGCTAGCACTGAT CGATGCTAGCGTAGTCAGCTGATGCTGTA CGATGCTAGTCGTACG "` [download] You can do with it whatever you like before you print it out, e.g. take it apart using `split` or via regex captures, perform regex substitutions on it, etc. Some more notes: The key for the hash is extracted via regex capture `my ($name) = /^(\w+)/;` [download] which would extract `"aw1"` in this case, because `\w+` stops matching at the dot. In case you'd need to extract keys as `"aw1.a1"` (or some such — I'm no fasta expert), you could modify the regex to also capture the dots `my ($name) = /^([\w.]+)/;` [download] or up until the first whitespace char in the line `my ($name) = /^(\S+)/;` [download] Or in case you'd want to print the headers only (which I'm not quite sure from your description), you could extract it similarly with a regex `my ($header) = /^([^\n]+)/;` [download] or by splitting on newlines `my ($header) = split /\n/;` [download] And so on... Hope this gives you some starting points to tailor it to your specific requirements.	[reply] [d/l] [select]
Re^6: Compare hash with arrays and print by ad23 (Acolyte) on Jul 13, 2010 at 17:56 UTC
Thanks again almut! I really appreciate your inputs. I am trying to manipulate $_ to obtain the desired results. But when I split it with a newline the data-lines also split up. In this case, if a record holds more than one sequence data, the remaining records are not getting printed! `while (<FASTA>) { s/^>//mg; # print "$_"; my ($name) = /^([^\n]+)/; #my ($name) = /^([\w.]+)/; #print "$name\n"; if($hash{$name} == 10){ select FILE1; } if($hash{$name} == 20){ select FILE2; } if($hash{$name} == 30){ select FILE3; } #print ">$_"; my $output = $_; my (@title) = split(/\n/,$output); print ">".$title[0]."\n".$title[1]."\n"; }` [download] I am splitting it with a new line character, in order to print the complete header line in my output files (and I am going wrong here). Please suggest? Thanks!	[reply] [d/l]
Re^7: Compare hash with arrays and print by almut (Canon) on Jul 13, 2010 at 18:23 UTC
Re^8: Compare hash with arrays and print by ad23 (Acolyte) on Jul 13, 2010 at 19:00 UTC
Some notes below your chosen depth have not been shown here