skyler has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, I have written this code sometime ago. Now it starts failing because it is not parsing correctly. It skips the accounts that is not supposed to print. However it doesn't print the complete numbers of lines which it needs to the output file. For example out of a file of 52 lines, let's say it is suppose to skip 10 (unwanted)lines and print 42(wanted) lines to an output file. It only prints 36 (wanted) lines. I don't understand why? Could you take a look at the script and let me know if there is anything that I need to do to correct the script. I was thinking in counting the number of lines then go line by line to move an unwanted account to another file. Could you give me an idea on how to accomplish it. Thank you in advance.
#!/usr/bin/perl -w use strict; my $infile = 'c:\\hl7file2.txt'; my ( $yr, $mo, $dy ) = (localtime)[5,4,3]; my $outfile = sprintf( "%04d%02d%02d.txt",$yr+1900,$mo+1,$dy ); my $counter; open IN, "<$infile" or die "Couldn't open $infile, $!"; open OUT,">$outfile" or die "Couldn't open $outfile, $!"; $counter++; print $counter; my @finds = qw( 00000 00001 00002 00003 00004 76370 76375 76950 77403 +77404 77406 77407 77408 77409 77411 77412 77413 77414 77416 77418 + 77370 77336 77417 ); my $finds_re = join '|', map { quotemeta }@finds; $finds_re = qr/$finds_re/; print $finds_re; while(<IN>) { next if m/$finds_re/; print OUT; } close IN;

Replies are listed 'Best First'.
Re: How to move unwanted lines out of a file
by graff (Chancellor) on Mar 19, 2004 at 01:16 UTC
    Fletch is right: there's really no way to diagnose the problem if you don't indicate what sorts of lines are matching when they shouldn't. Maybe if you do that, you'll see the problem yourself.

    As a wild guess, given that your regex comes out looking like this:

    /00000|00001|00002|.../
    and given that your logic says "don't print a line if it matches that regex", the first thing that comes to mind is that your 52 lines might include lines like this:
    ...100000.... any five zeros in a row will match ...4850001... any four zeros followed by 1 will match.
    You might think that a number like "100000" shouldn't match, but it will, because the strings in the regex are not anchored in any way. If this is the problem, the solution is easy:
    my $finds_re = '\b' . join( '|', map { quotemeta } @finds ) . '\b';
    (or something like that).
      I think that you are right. The regex was not anchored....
Re: How to move unwanted lines out of a file
by cormac (Acolyte) on Mar 18, 2004 at 20:45 UTC
    Please, the warnings pragma is your friend; and don't feel guilty to use diagnostics, either.
    --
    "In all large corporations, there is a pervasive fear that someone, somewhere is having fun with a computer on company time. Networks help alleviate that fear." - John C. Dvorak
Re: How to move unwanted lines out of a file
by Fletch (Bishop) on Mar 18, 2004 at 20:32 UTC

    Perhaps if you gave a sample of "wanted" and "unwanted" lines. . . . Then again, maybe you've just miscounted and there are only 36 wanted (happens to the best of us; use something like grep -c to double check yourself).

Re: How to move unwanted lines out of a file
by TomDLux (Vicar) on Mar 19, 2004 at 02:31 UTC

    Put a print before the 'next if m/$finds_re/;' line, for a test run, ,and you'll see which lines match the regex and which don't.

    Are you anticipating the patterns to match to change? The definition of quotemeta is:

    quotemeta EXPR quotemeta Returns the value of EXPR with all non-"word" characters back-slashed. (That is, all characters not matching "/[A-Za-z_0-9]/" will be preceded by a backslash in the returned string, regardless of any locale settings.) This is the internal function implementing the "\Q" escape in double-quoted strings.

    Since your patterns are numeric, quotemeta is a no-op. If you expect the patterns to change, anticipation might be a virrtue, but otherwise it might simply be an unnecessary complication.

    --
    TTTATCGGTCGTTATATAGATGTTTGCA