comment on

I plan on having a draft of my regex article ready for review by the end of June. Hopefully, by early July, Regexp::Parser will be on CPAN. Once that's ready to use, I'm going to make a couple sub-modules (like Regexp::Explain), and then I'm going to work subclassing it to match Perl 6 regexes.

What follows is rescinded by me; I won't delete the text, but it's here in a small red font to let you know it's (already) out-dated.

That being said, I'm also going to release (if I can figure out how to do it safely) re::capture, which will introduce a new assertion: (?N=pat). It will allow you to specify what capture group you're assigning to. Here's an example of its use:

# parses text like:
# name = japhy  age = "22"  lang = 'Perl'
# into a hash... but it retains those pesky quotes :/
my %data = $text =~ m{
  ([^=\s]+) \s* = \s* 
  (
    ' [^']* ' |
    " [^"]* " |
    \S+
  )
}xg;
[download]

That's pesky because then you have to post-process the quotes out of them. re::capture (isn't that a witty name?) will allow you to say:

# parses text like:
# name = japhy  age = "22"  lang = 'Perl'
# into a hash... but doesn't capture the quotes!
my %data = $text =~ m{
  ([^=\s]+) \s* = \s* 
  (?:
    ' (?2= [^']* ) ' |
    " (?2= [^"]* ) " |
    ( \S+ )
  )
}xg;
[download]

This case might be resolved in other ways, but it's a good demonstration of what the module does. The other thing I think I'll make it implement are captures that exist only in the regex, and are ignored (that is, not returned) afterwards. That means you can write:

# parses text like:
# name = japhy  age = "22"  lang = 'Perl'
# into a hash... but doesn't capture the quotes!
my %data = $text =~ m{
  ([^=\s]+) \s* = \s* 
  (?:
    (?*3= ['"] ) (?2= .*? ) \3 |
    ( \S+ )
  )
}xg;
[download]

and the regex will only return ($1, $2) each time it matches.

This is not going to be a filter, but rather will work like re, and redefine the functions Perl uses to do its compiling and matching. It won't change much, but it will add support for this new assertion.

_____________________________________________________
Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

In reply to Regex Report by japhy

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.