in reply to Why is it matching??

Take a close look at the logic of your code as it stands

If you compare your version of this with the code it is based on at Re: Little pattern problem..., you'll see that you have replaced the inner while loop with an if statement. This means that instead of looping, reading new lines and pushing them onto the array until if finds a line that does match the condition, it reads one line, pushes it and then does nothing else.

Suggestion: Go back to the original code, read through it and try and understand how it works before you try to modify it. Then, when you start making changes, leave warnings and strict enabled!. If you make a change and it gives you a warning, try and understand what the warning means and correct it. Add use Diagnostics; may help you to interpret the messages. Correct any such warnings before you move on to making the next change. In the long run, you'll learn faster and achieve your goal much quicker that way.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
If I understand your problem, I can solve it! Of course, the same can be said for you.

Replies are listed 'Best First'.
Re: Re: Why is it matching??
by bioinformatics (Friar) on Sep 11, 2003 at 22:03 UTC
    LOL....Thank you for the harsh rebuke:-). As for the warnings issue, I failed to mention that, but hadn't figured out what it was refering to until checking things out in Programming Perl. So, yes I do use warnings thank you very much. No offense, but I couldn't get your program to work as stated. It gave me the same problem I am asking about in this node. $target_name doesn't change, so only one key is present in the hash. I need the keys to continue changing with $target_name. Since that does not occur in either program, the question still stands....
    Bioinformatics Bioinformatics

      The rebuke wasn't intended to come across as harsh. Sorry that it did.

      Moving right along. Could you explain a little more of what you mean by ...but I couldn't get your program to work as stated...? I just downloaded the code again and it produced the output I listed, which show that two keys were created. The first with three probes found

      1415671 : GGAACAGGAATGTCGCAACATCGTA, ACATCGTATGGATTGCTGAGTGCAT, GGCTGATCACATCCAAAAAGTCATG

      And the second with 10:

      1415670 : GAGGAAACGTTCACCCTGTCTACTA, GTTCACCCTGTCTACTATCAAGACA, TACTATCAAGACACTCGAAGAGGCT, CTGTGGGCAATATTGTGAAGTTCCT, GAATGCATCCTTGTGAGAGGTCAGA, GAGAGGTCAGACAAAGTGCCAGAAA, AAAACAAGAACACCCACACGCTGCT, ACACGCTGCTGCTAGCTGGAGTATT, TATCTTGTCCAACACTACGTCGAAG, TTGTCACCATGCCTGCAAGGAGAGA

      This is as expected from the sample data you provided on that original post, although I've manually wrapped it to prevent it getting confused by the autowrapper.

      If you are getting different output when you run my original code, then could you post the output you get please and I'll try to work out what could be different.

      The way $target name gets updated in the original is like this.

      do { # extract the target name $target_name = $1 if m[( \d{7} ) _at: \d{3} : \d{3} ]x; while( m[$target_name] ) { # process the record containing the current target name my $probe = <DATA>; # Read the probe chomp $probe; # save it in an HoA keyed by the target name push @{ $probes{ $target_name } }, $probe; # get the next line; last unless defined( $_ = <DATA> ); } } until eof DATA; # till done

      $target_name is set at the top of the outer do..until loop.

      The code enters the inner while loop, reads the next line, extracts the probe pushes it onto the HoA.

      It then gets to the last unless defined... line, where it reads another line. So long as it hasn't reached the eof, then it loops back to the top and tests the while condition again. If it matches, the loop repeats, another probe is read and pushed.

      If it doesn't, then it falls out of the while loop and the until eof DATA condition is tested. If it's not at the eof, then it loops back to the top of the do...until loop and the new $target_name is extracted from the last line it read (which failed to match the while condition) and the cycle repeats.

      Hopefully, that explains how it works and will allow you to modify it to your needs.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
      If I understand your problem, I can solve it! Of course, the same can be said for you.

        How would one add in a newline into the print statement? The main problem I'm running into is that usage of quotations of any type throws the entire print statement off, causing it to print join rather that actaully join the sequences, etc. As well, is seems I would almost have to rewrite it to fit the newline in; when i do, it does at least come out on separate lines. The data is coming out now with original print statement, but all on one line now; I need it on separate lines...:-(...
        #my rewritten statement, for better or worse for (keys %probes) { print "$target_name, ':' join', ', @{$probes{$_}}, \n"; } _output_ 244901, ':' join', ', TTGCTGCTATTCTATCTATTTGTGC GACTTTCAAAGTGACTCTCGA +CGGG GAGCCTCCAGGCTATTCAGGAAGAA GAAGAATCGCAGCAATTCCCCAATC GAAGTAGTTCCT +CCGGAATCCAATG TCAGCTTGCGAATTTGTGGCACCGT TTACCAATGGCACGCTGTGCGCCTA GCA +AGCTTTGTTATGCCGAAACCTA AACACTTACAAATGCCACTTCTTCC GTCGCATCCGTTTTCAGGAC +GATCT AGCAATTTGCCTACTCTTGTATCTC, 244902, ':' join', ', GTATTCGGGGAATCCTCCTTAATAG ATATTCCTATTATGTCAATGC +CAAT AGCTGTGAATTCGAACTTTTTGGTA GGTATTTTCCGTTTCTTCGGATGAT GATGGGTCAAGT +ATTTGCTTCATTG TGCTTCATTGGTTCCAACGGTGGCA CCAACGGTGGCAGCTGCGGAATCCG GCG +GAATCCGCTATTGGGTTAGCCA TAGCCATTTTCGTTATAACTTTCCG TAACTTTCCGAGTCCGAGGT +ACTAT GAGTCCGAGGTACTATTGCTGTAGA,
        Bioinformatics
        I've run into a small problem with the code (yes, that means it works:-)). In a few data files, I have two different digit lengths in the file. Ex: 10001_at, and 123456_at. I've tried things like adding another if statement, another loop, even placing the \d{6} and \d{5} parameters in different programs. When I use \d{6} parameter, it is able to function correctly and gather the subsequent sequences. When I use \d{5} however, it can't find anything at all except the control sequences, which I know won't pattern match and don't care. SOOO...I know the \d{5} is working because it recognizes sequences that don't match, but won't recognize the 10001_at target name or 6 digit target name either. Any ideas?
        NOTE: the first half of the file is the 6 digits, while the second half is the 5 digit target name. Is it possible that the program stops partway throught the file since it can't imediately find a matching pattern? I thought that the $1 would cause it to look for the first matching pattern, no matter where it is in the file....

        Bioinformatics