in reply to Match a pattern only if it is not within another pattern

$str = 'blfoo and barthisfoothatqux and barsofooquxhim andfoosom foo'; $str =~ s[(bar.+?qux)|(foo)][defined $2 ? '123' : $1]ge; print $str; bl123 and barthisfoothatqux and barsofooquxhim and123som 123

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.

Replies are listed 'Best First'.
Re^2: Match a pattern only if it is not within another pattern
by punkish (Priest) on Aug 16, 2005 at 19:44 UTC
    Why, I didn't even think the way of evaluation. Deconstructing --

    (bar.+?qux)|(foo) # capture anything with 'bar' and 'foo' as # bookends in $1 OR # all other 'foo' in $2 defined $2 ? '123' : $1 # if $2 exists, replace it with 123 # otherwise replace $1 back into # the string ]ge # eval globally

    Thanks. I am glad to see this was beyond my league without your help.

    Update: BrowserUK, how on earth do you even begin to think this twisted? I can't fathom how to "practice" regexp matching other than answering questions from novices such as myself. I have been scanning Friedl's book, but I guess nothing substitutes for practice at ever increasing levels of complexity, much like a video game. Well, thanks for getting me over this particular hump for now.

    --

    when small people start casting long shadows, it is time to go to bed

      how on earth do you even begin to think this twisted?

      Trying to answer other people's questions is a very powerful technique for learning a subject more deeply yourself. In our normal lives, work (or play) tends to present us with a relatively static selection of problems to solve, and internal ("nope, too ugly") and external ("the in-house style guide") forces constrain our approaches to solving them. Dealing with someone else's problem, expressed in their own words and subject to their own constraints, can shake us from the shackles of habit upon our thoughts.

      Another way to leap out of that rut is to create artificial constraints of our own. The disciplines of writing obfuscations or playing perl golf are examples of such constraints, but they are easy to create - yes, I know I could do that with a regexp in a loop, but can I do it with just a regexp and no loop? Or in one regexp instead of two? Ok, now I've done that - ugly though it is - can I think of input text that would break it? Learning stuff from books has its place, but I have always felt that something you've discovered for yourself is worth twice as much. So experiment.

      I believe there is a very close relationship between the study of pattern (which is what regular expressions are all about) and the study of mathematics. A common mantra in mathematics is: so, you have this thing to prove, and you don't know how to prove it; so first, try proving something more specific - often that is easier, and maybe it'll give you a clue how to tackle the larger task. If that doesn't work (or even if it does), try proving something more general - paradoxically, sometimes that too turns out to be easier. I think BrowserUK's solution of matching more than you asked for is conceptually quite close to "proving something more general".

      Hugo

      This is a good thing to try to remember; it can come up a lot.

      Caveat: doesn't work if you need to support nested bar/qux pairs, e.g. only replacing the first and last foo in: foo bar foo bar foo qux foo qux foo

        Just remove the ? from .+? to make it greedy and it will work.

        You can do the same thing but add Regexp::Common for arbitrary balanced delimiters;

        use Regexp::Common; my $orig = my $str = 'foo bar foo bar foo qux foo qux foo'; $str =~ s{ ( $RE{balanced}{-begin => "bar"}{-end => "qux"} ) | (foo) } { defined $2 ? 123 : $1 }xge; print "$orig\n"; print "$str\n";

        Result:

        foo bar foo bar foo qux foo qux foo 123 bar foo bar foo qux foo qux 123

        -xdg

        Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

Re^2: Match a pattern only if it is not within another pattern
by Codon (Friar) on Aug 16, 2005 at 19:52 UTC
    Should the .+ be a \w+ so as to not jump words?
    $str = 'bart is a fool qux';
    will not replace 'foo'.

    Ivan Heffner
    Sr. Software Engineer, DAS Lead
    WhitePages.com, Inc.

      I guess that depends upon whether the OP is actually using the terms 'foo', 'bar' and 'qux', or whether they are just placeholders for the purpose of his question?


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
Re^2: Match a pattern only if it is not within another pattern
by tphyahoo (Vicar) on Aug 17, 2005 at 12:57 UTC
    Nice, but it only works with bar then foo then qux, not qux then foo then bar. (Following passes first test, fails second test.)
    use strict; use warnings; use Test::More qw(no_plan); my $str = 'blfoo and barthisfoothatqux and barsofooquxhim andfoosom fo +o'; my $expected = 'bl123 and barthisfoothatqux and barsofooquxhim and123s +om 123'; $str =~ s[(bar.+?qux)|(foo)][defined $2 ? '123' : $1]xge; is($str,$expected); #switch qux and bar $str = 'blfoo and quxthisfoothatbar and barsofooquxhim andfoosom foo'; $expected = 'bl123 and barthisfoothatqux and barsofooquxhim and123som +123'; $str =~ s[(bar.+?qux)|(foo)][defined $2 ? '123' : $1]xge; is($str,$expected);
    I'm trying to solve the more "general" problem with parse::recdescent, further on in the thread. I gave up before finding a solution though.

      If you want to learn to solve the general problem, the book "Mastering Regular Expressions" is highly recommended. If you want a solution to the general problem, Regexp::Common::balanced does it already.

      # note, this matches "qux foo bar" and "bar foo qux", but not "bar foo + bar" # see Regexp::Common::balanced documentation for details qr/$RE{balanced}{-begin => "qux|bar"}{-end => "bar|qux"}/

      -xdg

      Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

      That's "Working as designed".

      Would you expect to match ( stuff ) and ) stuff ( with the same regex? How would this be a generalisation?


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
        Well, technically the OP does say "surrounded", which could mean the boundaries are switched. Your usage of parens in the example is misleading, because parens are inherently related to internal grouping

        But I imagine (not tested) a simple extension of your original regexp could be in order.

        $str =~ s[(bar.+?qux)|(qux.+?bar)|(foo)][defined $3 ? '123' : (defined +($2) ? $2 : $1)]ge;