Maybe ActiveState will get a hold of me...

I've just crafted some... err, crafty... Perl code, using Perl 5.6's regex abilities. It highlights the portion of a string that is matched by a selected fragment of a regex. This has two practical applications:
Here's the code (if you're too lazy to click the above link):
$text = "1981 born; 1993 7th grade; 1995 9th grade; 1999 RPI; 2001 TPC +"; @section = ('\d{4}', '([^;]+)'); $text =~ m{ (?{ @s = map [], 1 .. @section }) (?: (?{ local $s[0] = [ @{ $s[0] }, [ length "$`$&" ] ] }) \d{4} (?{ $s[0][-1][1] = length "$`$&" }) \s+ (?{ local $s[1] = [ @{ $s[1] }, [ length "$`$&" ] ] }) [^;]+ (?{ $s[1][-1][1] = length "$`$&" }) (?: ;\s+ | $ ) )+ (?{ @watch = @s }) }x; for (0 .. $#watch) { print "$section[$_] matched in the following places:\n"; while (my $find = shift @{ $watch[$_] }) { my ($s, $e) = @$find; print substr($text, 0, $s), "<\e[1m", substr($text, $s, $e-$s), "\e[m>", substr($text, $e), "\n"; } }
What's it do? Well, right now, it's all manually done, but I'll be subclassing YAPE::Regex to make it automatic. But this is what it does: it keeps track, with an array, of the position in the string BEFORE and AFTER a specific chunk of the regex. It has all those local()s to make sure that on a failed attempt, whatever was just done is gotten rid of. Anyway, it builds up a list of array references, holding start/end position pairs. Then, it goes through and highlights (puts < and > around the selection, and tries ANSI reversal) the sections matched by the selected regex chunks.

This was fun to write. It'll be more fun to automate.

Now, what was that about /(pat)+/? Well, as I've said before, doing "abc" =~ /(.)+/ puts "c" in $1. How can we get the repeated sense we were looking for? I'm so glad you asked... ;)
# extracts the attributes from an HTML tag # and displays them, separately, with one regex $text = q{<img src="foo.jpg" ismap border=0>}; @section = ('attributes'); $text =~ m{ (?{ @s = map [], 1 .. @section }) < \w+ (?: \s+ (?{ local $s[0] = [ @{ $s[0] }, [ length "$`$&" ] ] }) \w+ (?{ $s[0][-1][1] = length "$`$&" }) (?: \s* = \s* (?: " (?{ local $s[1] = [ @{ $s[1] }, [ length "$`$&" ] ] }) [^"]* (?{ $s[1][-1][1] = length "$`$&" }) " | ' (?{ local $s[1] = [ @{ $s[1] }, [ length "$`$&" ] ] }) [^']* (?{ $s[1][-1][1] = length "$`$&" }) ' | (?{ local $s[1] = [ @{ $s[1] }, [ length "$`$&" ] ] }) [^\s>]+ (?{ $s[1][-1][1] = length "$`$&" }) ) | (?{ local $s[1] = [ @{ $s[1] }, [ -1 ] ] }) ) )* \s* > (?{ @watch = @s }) }x; print "The following attributes were found:\n"; for (@watch) { my $i = 0; while (my $find = shift @{ $watch[$i++] }) { my ($s, $e) = @$find; print("\n"), next if $s == -1; print "=" if $i == 2; print substr($text, $s, $e-$s); print "\n" if $i == 2; } } __END__ output: The following attributes were found: src=foo.jpg ismap border=0
(Note to self: damn, I'm good.)

japhy -- Perl and Regex Hacker

Replies are listed 'Best First'.
Re: Komodo... watch out!
by tachyon (Chancellor) on Jul 05, 2001 at 01:41 UTC
    "abc" =~ /(.+)/; print "$1\n"; "aaabcabccc" =~ /((?:abc)+)/; print "$1\n";

    tachyon

    s&&rsenoyhcatreve&&&s&n\w+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

        Surely you can't have it both ways? Either you want to match one lot of "pat" or several lots of "pat". You match what you want and do what you want with it, no further parsing required. You use a while (/(pat)/g) construct to get all the occurences of "pat" into $1 in sequence if that is what you want.

        cheers

        tachyon

        s&&rsenoyhcatreve&&&s&n\w+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print