Maybe ActiveState will get a hold of me...

I've just crafted some... err, crafty... Perl code, using Perl 5.6's regex abilities. It highlights the portion of a string that is matched by a selected fragment of a regex. This has two practical applications:
Here's the code (if you're too lazy to click the above link):
$text = "1981 born; 1993 7th grade; 1995 9th grade; 1999 RPI; 2001 TPC +"; @section = ('\d{4}', '([^;]+)'); $text =~ m{ (?{ @s = map [], 1 .. @section }) (?: (?{ local $s[0] = [ @{ $s[0] }, [ length "$`$&" ] ] }) \d{4} (?{ $s[0][-1][1] = length "$`$&" }) \s+ (?{ local $s[1] = [ @{ $s[1] }, [ length "$`$&" ] ] }) [^;]+ (?{ $s[1][-1][1] = length "$`$&" }) (?: ;\s+ | $ ) )+ (?{ @watch = @s }) }x; for (0 .. $#watch) { print "$section[$_] matched in the following places:\n"; while (my $find = shift @{ $watch[$_] }) { my ($s, $e) = @$find; print substr($text, 0, $s), "<\e[1m", substr($text, $s, $e-$s), "\e[m>", substr($text, $e), "\n"; } }
What's it do? Well, right now, it's all manually done, but I'll be subclassing YAPE::Regex to make it automatic. But this is what it does: it keeps track, with an array, of the position in the string BEFORE and AFTER a specific chunk of the regex. It has all those local()s to make sure that on a failed attempt, whatever was just done is gotten rid of. Anyway, it builds up a list of array references, holding start/end position pairs. Then, it goes through and highlights (puts < and > around the selection, and tries ANSI reversal) the sections matched by the selected regex chunks.

This was fun to write. It'll be more fun to automate.

Now, what was that about /(pat)+/? Well, as I've said before, doing "abc" =~ /(.)+/ puts "c" in $1. How can we get the repeated sense we were looking for? I'm so glad you asked... ;)
# extracts the attributes from an HTML tag # and displays them, separately, with one regex $text = q{<img src="foo.jpg" ismap border=0>}; @section = ('attributes'); $text =~ m{ (?{ @s = map [], 1 .. @section }) < \w+ (?: \s+ (?{ local $s[0] = [ @{ $s[0] }, [ length "$`$&" ] ] }) \w+ (?{ $s[0][-1][1] = length "$`$&" }) (?: \s* = \s* (?: " (?{ local $s[1] = [ @{ $s[1] }, [ length "$`$&" ] ] }) [^"]* (?{ $s[1][-1][1] = length "$`$&" }) " | ' (?{ local $s[1] = [ @{ $s[1] }, [ length "$`$&" ] ] }) [^']* (?{ $s[1][-1][1] = length "$`$&" }) ' | (?{ local $s[1] = [ @{ $s[1] }, [ length "$`$&" ] ] }) [^\s>]+ (?{ $s[1][-1][1] = length "$`$&" }) ) | (?{ local $s[1] = [ @{ $s[1] }, [ -1 ] ] }) ) )* \s* > (?{ @watch = @s }) }x; print "The following attributes were found:\n"; for (@watch) { my $i = 0; while (my $find = shift @{ $watch[$i++] }) { my ($s, $e) = @$find; print("\n"), next if $s == -1; print "=" if $i == 2; print substr($text, $s, $e-$s); print "\n" if $i == 2; } } __END__ output: The following attributes were found: src=foo.jpg ismap border=0
(Note to self: damn, I'm good.)

japhy -- Perl and Regex Hacker

In reply to Komodo... watch out! by japhy

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.