ftumsh has asked for the wisdom of the Perl Monks concerning the following question:

Lo, I have a snippet of fixed code, ie I can't change it. All that I have control over is the regex that is passed to it..
open FH, "< $file"; while (<FH>) { # for each regex for my $re ( keys %{ $compiled->{'rx'} } ) { $compiled->{'rx'}{$re}++ if /$re/; } } close FH;
I have some DOS (ie each line ends in \r\n) files that are look like (this is run on linux)
foo bar "99","END"
or
"99","END"

ie all files contain "99","END" but some contain other lines before it.

My problem is that I need to positively match the files containing _only_ the "99","END"

ie in the example files above, I have a regex that would match "foo", this would find the first file, however, what regex would find the second file? One of END wouldn't work because that would find the foo file.

The only idea I have is to embed perl in a regex and somehow test that if we have END and are on the first iteration of the while, it matches.

Any other ideas appreciated...

John

Replies are listed 'Best First'.
Re: using a regex to determine if a string is the start of the FILE
by GrandFather (Saint) on Sep 18, 2008 at 11:56 UTC

    Something like:

    my $re = qr/(?<=\S)\s*\n"99","END"/sm;

    Perl reduces RSI - it saves typing
      Will this still work bearing in mind that the file isn't slurped or binmoded so the line with the "99"..... will always be at the start of the line ie no preceding chars to look back at?

        Ah, that makes it a little more interesting and experimental. Try:

        /(??{ $. == 1 ? '"99","END"' : '(?=x)(?=X)' })/

        Perl reduces RSI - it saves typing
      My apologies. In another part of the code, which I didn't post the compile statement looks like:
      my $rx = qr/$_/;

      and I can't change that either. I do believe there is a way to put modifiers into a regex tho, I'll have a poke around in perlretut

        Yes you can, (?ms:$regex).

        Note that the scoping of /m inside the regex changed between perl 5.8.8 and perl 5.10.0.

Re: using a regex to determine if a string is the start of the FILE
by salva (Canon) on Sep 18, 2008 at 12:46 UTC
    untested:
    qr/(??{$.==1 ? '"99","END"' : "\n\n"})/
Re: using a regex to determine if a string is the start of the FILE
by AnomalousMonk (Archbishop) on Sep 18, 2008 at 14:36 UTC
    My problem is that I need to positively match the files containing _only_ the "99","END"
    So if I understand correctly, the files you want to match have only one line in them, namely "99","END"\r\n, with the \n presumably being used as the record separator for reading the lines, this being linux. You also can examine the lines only on a line-by-line basis.

    The (untested) string '\A(?!"99","END"\r\n\z)' compiled to a regex will only match a line other than '"99","END"\r\n', so if the count of this regex in the hash is greater than zero, you know the file contained a line or lines other than '"99","END"\r\n' and so is not a file of interest.

    Update: Added back some "s on END that somehow got dropped in a couple of places.