RegEx ignoring intervening characters?

mdunnbass has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: RegEx ignoring intervening characters? by ikegami (Patriarch) on Jan 19, 2007 at 20:53 UTC
`$s =~ /A[^A-Z]B[^A-Z]C[^A-Z]D/;` [download] or `# Same thing, but built dynamically. my $re = join '[^A-Z]', split //, 'ABCD'; $s =~ /$re/;` [download] or `my $temp = $s; $temp =~ s/[^A-Z]//g; $temp =~ /ABCD/;` [download]	[reply] [d/l] [select]
Re^2: RegEx ignoring intervening characters? by mdunnbass (Monk) on Jan 19, 2007 at 21:05 UTC
Re: `(my $x = $s) =~ s/[^A-Z]//g;` If I understand your code right, that'd delete anything matching `[^A-Z]` first, and then match the pattern second, right? I guess I should have explained better, but I don't want to modify anything interspersed within the matching ABCD characters. In fact, I very emphatically want them to remain unmolested. So, while your approach looks like it would work, it's not quite what I was looking for. As for `$s =~ /A[^A-Z]B[^A-Z]C[^A-Z]D/;` if the $x pattern I am looking for is from `uc(chomp($x = <STDIN>))`, would I just need to use split, inserting the `[^A-Z]` after every character? would that work? Thanks Matt	[reply] [d/l] [select]
Re^3: RegEx ignoring intervening characters? by imp (Priest) on Jan 19, 2007 at 21:12 UTC
if the $x pattern I am looking for is from uc(chomp($x = <STDIN>)), would I just need to use split, inserting the ^A-Z after every character? would that work?* Yes, you would do this: `my $x = <STDIN>; chomp $x; $x = uc($x); my $regex = join('[^A-Z]*', split //, $x);` [download]	[reply] [d/l]
Re^3: RegEx ignoring intervening characters? by webfiend (Vicar) on Jan 19, 2007 at 21:59 UTC
`(my $x = $s) =~ s/[^A-Z]//g;` That assigns the value of `$s` to `$x`, then performs the substitution on `$x`. `$s` is left unmolested. `$ perl -e 'chomp($orig = <STDIN>); ($tmp = $orig) =~ s/[^A-Z]//g; prin +t "Copy: $tmp\nOriginal: $orig\n";' WeDfT Copy: WDT Original: WeDfT` [download] So it's a perfectly valid element of your solution Update: Hey, how about I actually show it in action? `use strict; use warnings; chomp(my $input = <STDIN>); (my $test = $input) =~ s/[^A-Z]//g; if ($test =~ /ABCD/) { print "'$input' matches!\n"; } else { print "'$input' does not match.\n"; }` [download] `$ perl test.pl WaFFe 'WaFFe' does not match. $ [bwisti@w3d145 tmp]$ perl test.pl AeBwaffleCfrenchtoastD 'AeBwaffleCfrenchtoastD' matches!` [download]	[reply] [d/l] [select]
Re: RegEx ignoring intervening characters? by gaal (Parson) on Jan 19, 2007 at 20:55 UTC
The class of non uppercase English characters is `[^A-Z]`. So you need `$str =~ /^A[^A-Z]B[^A-Z]C[^A-Z]D$/;` [download] You can write this a little more clearly as: `my $nu = qr/[^A-Z]/; $str =~ /^ A ${nu} B ${nu} C ${nu} D $/x;` [download] (Use `[:^upper:]` or the Unicode `\P{IsUpper}` for non-English text.)	[reply] [d/l] [select]
Re: RegEx ignoring intervening characters? by imp (Priest) on Jan 19, 2007 at 20:59 UTC
You could do something like this: `use strict; use warnings; my $stuff = qr/[^A-Z]/; my $regex = qr{ A $stuff B $stuff C $stuff D }x; while (my $line = <DATA>) { if ($line =~ $regex) { print "Matched: $line"; } } __DATA__ ABhere is intervening textC D A B C, lots of text and numbers and equals and slashes DEFG ABhere IS intervening TEXTC D` [download] And an alternate way of forming the pattern: `my $stuff = qr/[^A-Z]/; my $regex = join($stuff, split '','ABCD');` [download]	[reply] [d/l] [select]
Re^2: RegEx ignoring intervening characters? by ww (Archbishop) on Jan 19, 2007 at 22:17 UTC
Niggle: edge case? specs? To illustrate; note the use of three (In some ways, more precise, I think but am ~~inviting~~ begging for other views, please?) distinct regexen, $stuff1, $stuff2, and $stuff3 and the last line of __DATA__ and the LAST line of output use strict; use warnings; my $stuff1 = qr/[^A]\|[^C-Z]/; my $stuff2 = qr/[^A-B]\|[^D-Z]/; my $stuff3 = qr/[^A-C]\|[^E-Z]/; my $regex = qr{ A $stuff1 B $stuff2 C $stuff2 D }x; while (my $line = <DATA>) { chomp($line); if ($line =~ $regex) { print "Matched: \" $line \"\n"; } else { print "Did NOT match \"$line\"\n"; } } __DATA__ ABhere is intervening textC D A B C, lots of text and numbers and equals and slashes DEFG ABhere IS intervening TEXTC D A BIG foo is B intervening text CD A Big Cat interjects itself into text before CD [download] OUTPUT: Matched: " ABhere is intervening textC D " Matched: " A B C, lots of text and numbers and equals and slashes DEFG " Did NOT match "ABhere IS intervening TEXTC D" Did NOT match "A BIG foo is B intervening text CD" Matched: " A Big Cat interjects itself into text before CD " In the last line of __DATA__ an uppercase "C" preceeds another* uppercase "C" (penultimate character), yet the regex does not object (i.e., says there's a match). Update: Someone upvoted this as I was updating it -- to fix mental and typographic glitches; the said updating may have removed what the ++er thought was meritorious. Sorry.	[reply] [d/l]
Re: RegEx ignoring intervening characters? by diotalevi (Canon) on Jan 19, 2007 at 20:53 UTC
You said you wanted "zero or more characters as long as they aren't uppercase" which in code is `[^A-Z]` provided you think A-Z is all your uppercase characters. In POSIX that'd be `[^[:upper:]]` and in Unicode it'd be `\P{IsUpper}*` ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊	[reply] [d/l] [select]
Re: RegEx ignoring intervening characters? by Cody Pendant (Prior) on Jan 21, 2007 at 02:04 UTC
Am I crazy or would it be sensible just to do something like `$str =~ tr/A-Z//cd` [download] and then examine what's left? I don't know how to do benchmarking but rather than do a complex regex with lots of stars in it, why not turn the problem inside out? ($_='kkvvttuu bbooppuuiiffss qqffssmm iibbddllffss') =~y~b-v~a-z~s; print	[reply] [d/l]
Re^2: RegEx ignoring intervening characters? by mdunnbass (Monk) on Feb 01, 2007 at 15:39 UTC
It would be sensible, except for the fact that I need to keep the original text intact.	[reply]