Skipping special tags in regexes

fletcher_the_dog has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Skipping special tags in regexes by Abigail-II (Bishop) on Dec 04, 2003 at 16:09 UTC
#!/usr/bin/perl use strict; use warnings; my $tag = '(?:<[^>]>)'; sub word { my $word = shift; qr /$tag$word/; } while (<DATA>) { s/((??{ word 'tacos' }))/yummy $1/g; s/((??{ word 'salad' }))/green $1/g; print; } __DATA__ I like tacos. <5b>I <5c>like <5d>tacos. I like a salad. <foo>I <foo>like <foo>a <bar><baz><foo>salad. <foo>I <foo>like <foo>a <bar><baz><foo>salad <bup>with <bobob>tacos. I like yummy tacos. <5b>I <5c>like yummy <5d>tacos. I like a green salad. <foo>I <foo>like <foo>a green <bar><baz><foo>salad. <foo>I <foo>like <foo>a green <bar><baz><foo>salad <bup>with yummy <bo +bob>tacos. [download] Abigail	[reply] [d/l]
Re: Re: Skipping special tags in regexes by flounder99 (Friar) on Dec 04, 2003 at 18:40 UTC
Is there a reason you are using the `/(??{ })/` code block other than just to make a more general solution? This seems to work just fine. #!/usr/bin/perl use strict; use warnings; my $tag = '(?:<[^>]>)'; while (<DATA>) { s/($tagtacos)/yummy $1/g; s/($tag*salad)/green $1/g; print; } __DATA__ I like tacos. <5b>I <5c>like <5d>tacos. I like a salad. <foo>I <foo>like <foo>a <bar><baz><foo>salad. <foo>I <foo>like <foo>a <bar><baz><foo>salad <bup>with <bobob>tacos. I like yummy tacos. <5b>I <5c>like yummy <5d>tacos. I like a green salad. <foo>I <foo>like <foo>a green <bar><baz><foo>salad. <foo>I <foo>like <foo>a green <bar><baz><foo>salad <bup>with yummy <bo +bob>tacos. [download] -- flounder	[reply] [d/l] [select]
Re: Skipping special tags in regexes by Abigail-II (Bishop) on Dec 04, 2003 at 22:23 UTC
Is there a reason you are using the /(??{ })/ code block other than just to make a more general solution? Because I was expecting the OP to refine his question, and come up with a slightly different definition of a "word" (perhaps it needed trailing tags as well, or not more than 2 tags, whatever). Then I only need to change the sub, and not every regex using it. I think my solution is more general than yours, but the effect is the same. Abigail	[reply]
Re:**3: Skipping special tags in regexes by flounder99 (Friar) on Dec 05, 2003 at 14:15 UTC
Re: Skipping special tags in regexes by BrowserUk (Patriarch) on Dec 04, 2003 at 17:28 UTC
You might consider creating your own custom tag. This example is probably not well thought through, it's basically just a tweaking of the example given at the end of perlre, but I'd never tried this before so I did what came easy:) #! perl -slw use strict; package CustomReTag; use overload; sub import { shift; die "No argument to customre::import allowed" if @_; overload::constant 'qr' => \&convert; } sub invalid { die "/$_[0]/: invalid escape '\\$_[1]'"} my %rules = ( '\\' => '\\', 'Y\|' => qr/<\w\w>/ ); sub convert { my $re = shift; $re =~ s{ \\ ( \\ \| Y . ) } { $rules{$1} or invalid($re,$1) }sgex; return $re; } package main; my $s = '<5b>I <5c>like <5d>tacos'; my $re = CustomReTag::convert '(\Y\|tacos)'; $s =~ s[$re][yummy $1]g; print $s; __END__ P:\test>junk <5b>I <5c>like yummy <5d>tacos [download] Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail Hooray! Wanted!	[reply] [d/l]
Re: Skipping special tags in regexes by thospel (Hermit) on Dec 04, 2003 at 16:07 UTC
The wanted solution is not well specified though. In your example, it would seem equally valid to return: <5b>I<5c>like <5d>yummy tacos (corresponding to a different place to put the `(<\w\w>)?`).You'll have to state this doesn't matter or state a way to resolve this ambiguity.	[reply] [d/l]
Re: Re: Skipping special tags in regexes by fletcher_the_dog (Friar) on Dec 04, 2003 at 16:30 UTC
That is good point. The tags are associated with the word immediately following them, so it would be nice if I could associate them with the first character of that word.	[reply]
Re: Skipping special tags in regexes by dragonchild (Archbishop) on Dec 04, 2003 at 16:01 UTC
Build dynamic regexes. Maybe something like: `my $regex = 's/(\w+)\s+tacos/$1 yummy tacos/g'; $regex =~ s/(\\s\+)(\w)/$1(<\\w\\w>)$2/g; # Handle first part of + substitution my $match = $2; $regex =~ s/ $match/\$2$match/g; # Handle second part o +f substitution eval "$regex";` [download] That would handle the transform from the first to the second. Ideally, you would re-evaluate your regexes and build them using some regex builder. The builder would handle the optional tags and making sure they stayed in after the substitutions. ------ We are the carpenters and bricklayers of the Information Age. Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.	[reply] [d/l]
Re: Skipping special tags in regexes by delirium (Chaplain) on Dec 04, 2003 at 18:25 UTC
Another possibility is to not rely on regexes. I'm not sure what your specs for searching and replacing are, but if they are narrowed down to prefixing words with other words, as in your example, you could get away with building a custom function and passing it the line to change, the word to look for, and what to prefix it with, e.g.: #!/usr/bin/perl -w use strict; sub prefix { my ($line, $look_for, $prefix, $flip) = @_; return $line unless $look_for && $line =~ /$look_for/; my @arr = split /(<[^>]*>)/, $line; for my $cnt (0..$#arr) { if ($arr[$cnt] =~ /$look_for/) { $arr[$cnt-2] .= $prefix if $cnt > 1 && !$flip; $arr[$cnt+2] = $prefix . ' ' . $arr[$cnt+2] if $cnt < $#ar +r-1 && $flip; } } return join '', @arr; } my $line = "<5b>I <5c>like <5d>tacos\n"; print prefix ($line, 'tacos', 'yummy'); print prefix ($line, 'like', 'yummy', 1); __OUTPUT__ <5b>I <5c>like yummy<5d>tacos <5b>I <5c>like <5d>yummy tacos [download]	[reply] [d/l]
Re: Re: Skipping special tags in regexes by fletcher_the_dog (Friar) on Dec 04, 2003 at 21:04 UTC
unfortunately, I have many regexes that are not fixed strings and require all the powers of regexes	[reply]
Re: Skipping special tags in regexes by BUU (Prior) on Dec 04, 2003 at 22:03 UTC
My best guess would be to actually attempt to parse it some how, and store the total thing in some sort of datastructure. The simples would be a hash of the form tag => string, then you could just iterate over the values of the hash to ignore the tags, and vice versa. How you would actually parse this string is a bit beyond me, if the tags are truly as simple as you depict here then it should be fairly simple to just use a regex `/<5\w>\w+/` or something, but beyond that you would have to look at some of the parsers on cpan.	[reply] [d/l]
Re: Re: Skipping special tags in regexes by CountZero (Bishop) on Dec 04, 2003 at 22:48 UTC
If the tags are associated with the word directly following them (without any intervening whitespace), then you could split the sentence on whitespace and then split of each tag from the word following it. It would then be trivial to build a data-structure you could use as a basis to put the tags back in after the regex has done its thing with the "untagged" sentence. CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law	[reply]
Re: Re: Re: Skipping special tags in regexes by fletcher_the_dog (Friar) on Dec 04, 2003 at 23:19 UTC
The only problem is that each word in a string may not be unique, so you couldn't just plop things in a hash. Also the regexes might introduce new words someplace in the string that alreay existed in the string somewhere else. That why I have tried using diffing, but it was just too slow.	[reply]
Re: Re: Re: Re: Skipping special tags in regexes by CountZero (Bishop) on Dec 05, 2003 at 06:31 UTC


Your skill will accomplish what the force of many cannot
	PerlMonks