Otogi has asked for the wisdom of the Perl Monks concerning the following question:

Line[ ]above\n (Description[ ]"(.*)"\n)? Line[ ]below\n
is there any way to differentiate between the following two target strings:
Line above Description "xyz" Line below Line above Line below
The problem is the lower string will return blank and the one above also will return blank if "xyz" is not there. Right now I bracket the whole line to know if the line actually exists or not but thats dumb, there must be a better way. Any thoughts? I hope I was clear please ask if you need more clarification on some point.

Thank you.

Replies are listed 'Best First'.
Re: regex problem
by GrandFather (Saint) on Feb 27, 2006 at 22:02 UTC

    It's not clear to me what you are trying to achieve, but maybe the following will point you in the right direction:

    use strict; use warnings; while (<DATA>) { next if ! (/Line above/ .. /Line below/); next if ! /Description\s+"([^"]*)"/; print "Found $1 in line $.\n"; } __DATA__ Line above Description "xyz" Line below Line above Line below

    Prints:

    Found xyz in line 2

    DWIM is Perl's answer to Gödel
Re: regex problem
by diotalevi (Canon) on Feb 27, 2006 at 22:05 UTC

    Assuming the pattern matched, $1 will be true if there was a description line. $2 will contain the description. As an optimization, you can get a faster pattern match by not putting your spaces into character classes. That just slows the engine down.

    ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

      As an optimization, you can get a faster pattern match by not putting your spaces into character classes. That just slows the engine down.
      I smell premature optimization...

      You're correct. However, there is not enough information from the OP to determine if that's a good idea.

      -QM
      --
      Quantum Mechanics: The dreams stuff is made of

        Someone recently estimated to me in a BOTE calculation that [ ] is 10x slower. That is, this hamstrings the screamingly fast boyer-moore literal string matching part of the regex engine and compiles /xxx[ ]xxx/ down to exact("xxx"), anyof(" "), exact("xxx") where it was originally exact( "xxx xxx" ).

        You've changed the regex from one for a seven character literal to one that contains two three character literals. That changes how the engine matches and it changes how quickly the BM part of the engine can either discard entirely or find suitable candidates. It also causes more overhead because three ops have to be executed instead of just one.

        I'm asserting that [ ] is pretty to look at but has serious consequences to the regexp's performance. I wouldn't ordinarilly choose the pretty version in that case. I rarely ever use /x though so I don't need to escape my spaces.

        ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

Re: regex problem
by ambrus (Abbot) on Feb 28, 2006 at 11:53 UTC

    I don't think it's dumb to use a bracket just to see which branch of a regexp succeeded, or at least I do that too. For example, in the glob_to_re sub of my cgrep script, I use empty brackets for this reason. (Update: I've used ampersands as a regexp delimiter in a stupid moment, so look for m& if you can't find it.)