anchorite has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks,
I have written a while loop to parse a file (a pdb file), I want to break out of the while loop when it comes across a certain regex. In particular I want the loop to return:
/^SEQRES[\s]{2,5}1\s/
but break out of the loop at:
/^SEQRES[\s]{2,5}2\s/
so far I have written:
while ($line=<INFILE>, $line ne ~/^SEQRES[\s]{2,5}2\s/) { chomp $line; if ($line=~/^SEQRES[\s]{2,5}1\s/) { print "$line\n"; print OUTFILE ("$line\n"); } }
PERL returns the error message:
Use of uninitialized value in string ne at superchimera.pl line 122, <INFILE> line 5579.
by which I imagine it means $line. I must stress that the loop runs and returns the regex without the ne condition, so it is this part of the code that appears to be faulty.
Can anyone help (sorry if this is poorly written, this is my first question of the honoured monks)?

Replies are listed 'Best First'.
Re: more than one condition to a while loop
by dragonchild (Archbishop) on Aug 21, 2003 at 16:51 UTC
    To answer a few more questions:
    1. while ((my $line = <INFILE>) && ($line !~ /SEQ.../)) (Removed as per tye's second comment. Just look below - it's better code anyways.)
    2. It's Perl, not PERL
    3. The regex match operator is =~. The regex not-match operator is !~. It's an atomic 2-character-long operator, not something like += or -=.
    4. Rewrite as such:
      while (my $line = <INFILE>) { # Do a match against the line, grabbing the number. my ($num) = $line =~ /SEQRES\s{2,5}(\d)\s/; # This means that the line didn't meet our requirements # as defined by the regex. So, go get the next line. # (I added this ... you may not want it, depending on # what else you're doing in the loop.) next unless defined $num; # If that number is 2, leave the while loop. last if $num == 2; # Don't chomp unless there's a possibility we'll use it. chomp $line; # If the number is 1, print it to two places. if ($num == 1) { print "$line\n"; print OUTFILE "$line\n"; } }
    Make it obvious what the difference is between the two matches. I assigned the difference to $num, which was done via the capture mechanism of regexes. (Look for the parentheses.)

    Update: Changed code per tye's comments.

    ------
    We are the carpenters and bricklayers of the Information Age.

    The idea is a little like C++ templates, except not quite so brain-meltingly complicated. -- TheDamian, Exegesis 6

    Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

      while (my $line = <INFILE> && $line !~ /SEQ.../)

      I'm usually suspicious of "my" inside conditionals. In this case, your code won't even compile under strict as you are trying to assign the value of

      <INFILE> && $line !~ /SEQ.../
      to my $line.

      Update: Your new version doesn't fix the problem:

      while ((my $line = <INFILE>) && ($line !~ /SEQ.../))
      still produces:
      Global symbol "$line" requires explicit package name
      because my doesn't declare a variable until the end of the statement that it occurs in (as I understand it).

                      - tye
      2. It's Perl, not PERL

      In the sense that he is using it, it's probably perl or perl.exe. If it is the latter, it probably works when he calls it PERL. ;-)

      -sauoq
      "My two cents aren't worth a dime.";
      
Re: more than one condition to a while loop
by davido (Cardinal) on Aug 21, 2003 at 16:55 UTC
    Try this:

    while ( <INFILE> ) { chomp; if ( /^SEQRES[\s]{2,5}1\s/ ) { print "$_\n"; print OUTFILE "$_\n"; } elsif ( /^SEQRES[\s]{2,5}2\s/ ) { last; } }

    The above snippet should print everything that matches the first rexexp, and continue iterating through the loop, printing lines that match the first regexp, until a line matches the second regexp, at which time the loop exits. If the second regexp never matches, the loop exits when there's no more file to read.

    Note the use of last; last basically jumps to the end of the current block or loop, as though the loop's criteria suddenly became false. continue wouldn't work, because continue just skips up until the end of the current iteration and then lets the loop continue iterating. And break doesn't work because, as you find in the index of the Camel book, "Break: See last".

    Though this isn't what your code was doing, it is nevertheless, what your description of the problem said you intended to do.

    Hope this helps!

    Dave

    "If I had my life to do over again, I'd be a plumber." -- Albert Einstein

      As a followup to my own post, you could make the loop more efficient by only using one regexp as follows:

      while ( <INFILE> ) { chomp; if ( /^SEQRES[\s]{2,5}(1|2)\s/ ) { if ( $1 eq "1" ) { print "$_\n"; print OUTFILE "$_\n"; } else { last; } } }

      If I thought about it more I'm sure I could clean up that ugly cascading if statement too, but this works and is reasonably efficient.

      Dave

      "If I had my life to do over again, I'd be a plumber." -- Albert Einstein

      Thankyou for such lightning quick responses, several of those responses all seem to work and the one with the elsif is very elegant. Your Humble Servant, Anchorite
Re: more than one condition to a while loop
by TomDLux (Vicar) on Aug 21, 2003 at 19:13 UTC

    I would suggest moving the exit test to the top of the loop. Text for unusual conditions, where it is clear what's happening, then, all else being out of the way, go on to main processing, knowing nothing unusual will happen.

    If there are many conditions to check, this can get in the way of clarity, but certainly not here.

    while ( <INFILE> ) { chomp; last if /^SEQRES[\s]{2,5}2/; # or my $num = /^SEQRES[\s]{2,5}(\d)/; last if $num == 2;

    --
    TTTATCGGTCGTTATATAGATGTTTGCA

A reply falls below the community's threshold of quality. You may see it by logging in.