Re^2: Match a pattern only if it is not within another pattern
by punkish (Priest) on Aug 16, 2005 at 19:44 UTC
|
Why, I didn't even think the way of evaluation. Deconstructing --
(bar.+?qux)|(foo) # capture anything with 'bar' and 'foo' as
# bookends in $1 OR
# all other 'foo' in $2
defined $2 ? '123' : $1 # if $2 exists, replace it with 123
# otherwise replace $1 back into
# the string
]ge # eval globally
Thanks. I am glad to see this was beyond my league without your help.
Update: BrowserUK, how on earth do you even begin to think this twisted? I can't fathom how to "practice" regexp matching other than answering questions from novices such as myself. I have been scanning Friedl's book, but I guess nothing substitutes for practice at ever increasing levels of complexity, much like a video game. Well, thanks for getting me over this particular hump for now.
--
when small people start casting long shadows, it is time to go to bed
| [reply] [d/l] |
|
|
how on earth do you even begin to think this twisted?
Trying to answer other people's questions is a very powerful technique for learning a subject more deeply yourself. In our normal lives, work (or play) tends to present us with a relatively static selection of problems to solve, and internal ("nope, too ugly") and external ("the in-house style guide") forces constrain our approaches to solving them. Dealing with someone else's problem, expressed in their own words and subject to their own constraints, can shake us from the shackles of habit upon our thoughts.
Another way to leap out of that rut is to create artificial constraints of our own. The disciplines of writing obfuscations or playing perl golf are examples of such constraints, but they are easy to create - yes, I know I could do that with a regexp in a loop, but can I do it with just a regexp and no loop? Or in one regexp instead of two? Ok, now I've done that - ugly though it is - can I think of input text that would break it? Learning stuff from books has its place, but I have always felt that something you've discovered for yourself is worth twice as much. So experiment.
I believe there is a very close relationship between the study of pattern (which is what regular expressions are all about) and the study of mathematics. A common mantra in mathematics is: so, you have this thing to prove, and you don't know how to prove it; so first, try proving something more specific - often that is easier, and maybe it'll give you a clue how to tackle the larger task. If that doesn't work (or even if it does), try proving something more general - paradoxically, sometimes that too turns out to be easier. I think BrowserUK's solution of matching more than you asked for is conceptually quite close to "proving something more general".
Hugo
| [reply] |
|
|
| [reply] [d/l] |
|
|
Just remove the ? from .+? to make it greedy and it will work.
| [reply] |
|
|
use Regexp::Common;
my $orig = my $str = 'foo bar foo bar foo qux foo qux foo';
$str =~ s{ ( $RE{balanced}{-begin => "bar"}{-end => "qux"} ) | (foo) }
{ defined $2 ? 123 : $1 }xge;
print "$orig\n";
print "$str\n";
Result:
foo bar foo bar foo qux foo qux foo
123 bar foo bar foo qux foo qux 123
-xdg
Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.
| [reply] [d/l] [select] |
Re^2: Match a pattern only if it is not within another pattern
by Codon (Friar) on Aug 16, 2005 at 19:52 UTC
|
Should the .+ be a \w+ so as to not jump words?
$str = 'bart is a fool qux';
will not replace 'foo'.
Ivan Heffner
Sr. Software Engineer, DAS Lead
WhitePages.com, Inc.
| [reply] [d/l] |
|
|
| [reply] [d/l] |
Re^2: Match a pattern only if it is not within another pattern
by tphyahoo (Vicar) on Aug 17, 2005 at 12:57 UTC
|
Nice, but it only works with bar then foo then qux, not qux then foo then bar. (Following passes first test, fails second test.)
use strict;
use warnings;
use Test::More qw(no_plan);
my $str = 'blfoo and barthisfoothatqux and barsofooquxhim andfoosom fo
+o';
my $expected = 'bl123 and barthisfoothatqux and barsofooquxhim and123s
+om 123';
$str =~ s[(bar.+?qux)|(foo)][defined $2 ? '123' : $1]xge;
is($str,$expected);
#switch qux and bar
$str = 'blfoo and quxthisfoothatbar and barsofooquxhim andfoosom foo';
$expected = 'bl123 and barthisfoothatqux and barsofooquxhim and123som
+123';
$str =~ s[(bar.+?qux)|(foo)][defined $2 ? '123' : $1]xge;
is($str,$expected);
I'm trying to solve the more "general" problem with parse::recdescent, further on in the thread. I gave up before finding a solution though. | [reply] [d/l] |
|
|
If you want to learn to solve the general problem, the book "Mastering Regular Expressions" is highly recommended. If you want a solution to the general problem, Regexp::Common::balanced does it already.
# note, this matches "qux foo bar" and "bar foo qux", but not "bar foo
+ bar"
# see Regexp::Common::balanced documentation for details
qr/$RE{balanced}{-begin => "qux|bar"}{-end => "bar|qux"}/
-xdg
Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.
| [reply] [d/l] |
|
|
| [reply] [d/l] [select] |
|
|
Well, technically the OP does say "surrounded", which could mean the boundaries are switched. Your usage of parens in the example is misleading, because parens are inherently related to internal grouping
But I imagine (not tested) a simple extension of your original regexp could be in order.
$str =~ s[(bar.+?qux)|(qux.+?bar)|(foo)][defined $3 ? '123' : (defined
+($2) ? $2 : $1)]ge;
| [reply] [d/l] |
|
|
|
|