Explanation for Reg Expr

decoder has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Explanation for Reg Expr by bobf (Monsignor) on Mar 01, 2007 at 16:47 UTC
YAPE::Regex::Explain creates a human-readable explanation of regexen. `use strict; use warnings; use YAPE::Regex::Explain; my $re = qr/(?=(\d\d\d))/; print YAPE::Regex::Explain->new($re)->explain;` [download] Output: The regular expression: (?-imsx:(?=(\d\d\d))) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- (?= look ahead to see if there is: ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- \d digits (0-9) ---------------------------------------------------------------------- \d digits (0-9) ---------------------------------------------------------------------- \d digits (0-9) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- ) end of look-ahead ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- [download] See perlre for more information on pattern extensions (like '?=') and modifiers (such as 'g').	[reply] [d/l] [select]
Re: Explanation for Reg Expr by reasonablekeith (Deacon) on Mar 01, 2007 at 16:34 UTC
have a look at the "look-ahead assertion" in "perldoc perlre" Your code is just an example to show that normally (when the g switch is set) a regex engine will start looking for it's next match _after_ the last character of the previous match. However, characters matched in a look-ahead assertion don't 'count' as far as the regex engine is concerned, and it will not automatically skip over them when trying to find the next match. It's an advanced topic, and I really think you'd need to understand how regex engines work to understand this, to which end I'd highly recommend Jeffrey Friedl's Mastering Regular Expressions. UPDATE: changed wording in second paragraph --- my name's not Keith, and I'm not reasonable.	[reply]
Re: Explanation for Reg Expr by Moron (Curate) on Mar 01, 2007 at 16:35 UTC
I have to say that trying to understand the relevant parts of perlre can feel like doing a structural analysis of the propositions of Wittgenstein, but I'll give it a try:- I would say that the behaviour has to be explained by a combination of three provisions of this documentation: 1) "/g"match globally 2) "?="(update: "lookahead and ..." ) match best (see #3 below) and chain global matches 3) "match best""the earliest match is always the best." More update: oops that's four provisions, er, amongst our provisions are, er, "I'll come in again" -M Free your mind	[reply] [d/l] [select]
Re^2: Explanation for Reg Expr by mdunnbass (Monk) on Mar 01, 2007 at 18:45 UTC
I have to say that trying to understand the relevant parts of perlre can feel like doing a structural analysis of the propositions of Wittgenstein, but I'll give it a try:- Wait, you mean it wasn't just me? =) Glad to know other ppl have trouble wading through that page as well. Matt	[reply]
Re^3: Explanation for Reg Expr by ysth (Canon) on Mar 01, 2007 at 19:19 UTC
It says so right on the page. http://perldoc.perl.org/perlre.html#BUGS: This document varies from difficult to understand to completely and utterly opaque. The wandering prose riddled with jargon is hard to fathom in several places. This document needs a rewrite that separates the tutorial content from the reference content. I'm looking for a job.	[reply]
Re^4: Explanation for Reg Expr by GrandFather (Saint) on Mar 01, 2007 at 19:48 UTC
Re: Explanation for Reg Expr by GrandFather (Saint) on Mar 02, 2007 at 01:53 UTC
/g tells the regex engine to find all the matching sequences in the target string it can. In list context anything captured during the process is returned in a list. But there is some subtle stuff going on in your sample. When a match is zero width (like `/()/g`) an unterminated loop can be generated. The regex engine choses a "second best match" one character further along to avoid the problem. That situation applies in your sample code where the (?=...) is a zero width assertion (it just shoves a stake in the ground and says "match from here"). Because there are capture brackets inside the assertion expression (`(\d\d\d)`) three digits are captured. Because the assertion is zero width the regex engine moves the capture point along one character for each iteration. DWIM is Perl's answer to Gödel	[reply] [d/l] [select]
Re: Explanation for Reg Expr by fenLisesi (Priest) on Mar 01, 2007 at 16:37 UTC
It seems to be self-explanatory, with the not-so-badly chosen variable names and all. You can google for "perldoc regex", click on the first item and search within your browser page for '?='. Cheers. Update: Sorry about the less than helpful note above. As Moron says, the relevant section of perlre is not very long-winded. Here is some rant that is: Read more... (5 kB)	[reply] [d/l] [select]