Explain a regex

BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Explain a regex by Util (Priest) on Jan 22, 2005 at 19:28 UTC
`$RE{num}{real}` is created by the `real_creator` subroutine in Regexp/Common/number.pm; this highly flexible subroutine builds a custom RE from parameters `($base, $places, $radix, $sep, $group, $expon)`. For example, the code that creates `$RE{num}{real}` is: `pattern name => [qw (num real -base=10), '-places=0,', qw (-radix=[.] -sep= -group=3 -expon=E)], create => \&real_creator, ;` [download] Although I see a few places where special cases of sub-expressions could be recognized and automatically replaced with their simpler forms (e.g. `{0,}` becomes `*`, and base-16 `[0123456789ABCDEF]` becomes `[0-9A-F]`), I do not disagree with the module author's choice to leave those cases in their general form; the module code is clearer, and is less likely to produce incorrect REs, than if it included code to "tighten-up" the RE. In short, the RE is optimized for clarity and correctness in the generating code, rather than for clarity or conciseness in the RE itself.	[reply] [d/l] [select]
Re^2: Explain a regex by BrowserUk (Patriarch) on Jan 22, 2005 at 23:58 UTC
Good explanation. Thanks. Examine what is said, not who speaks. Silence betokens consent. Love the truth but pardon error.	[reply]
Re: Explain a regex by merlyn (Sage) on Jan 22, 2005 at 15:57 UTC
I don't understand why \d (or even the range 0-9) wasn't used instead of the large character class, or \. wasn't used instead of the dot character class, or why * was used in one place but {0,} was use in another. I'd say this was the work of someone who wasn't completely clued. I'd hate to see the rest of their code. -- Randal L. Schwartz, Perl hacker Be sure to read my standard disclaimer if this is a reply.	[reply]
Re^2: Explain a regex by BrowserUk (Patriarch) on Jan 22, 2005 at 16:29 UTC
!? `use Regexp::Common qw[number];; print $RE{num}{real};; (?:(?i)(?:[+-]?)(?:(?=[0123456789]\|[.])(?:[0123456789]*)(?:(?:[.])(?:[ +0123456789]{0,}))?)(?:(?:[E])(?:(?:[+-]?)(?:[0123456789]+))\|))` [download] Examine what is said, not who speaks. Silence betokens consent. Love the truth but pardon error.	[reply] [d/l]
Re^3: Explain a regex by Tanktalus (Canon) on Jan 22, 2005 at 16:51 UTC
I think that Randal's point is that just because something is on CPAN, or even in the main perl distribution (which I don't think is the case for Regexp::Common), doesn't mean that it's the most optimal, clue-filled way to do it. ;-) For example, my modules on CPAN probably would not meet with Randal's full approval either ;-)	[reply]
Re^4: Explain a regex by merlyn (Sage) on Jan 22, 2005 at 20:26 UTC
Re^3: Explain a regex by erix (Prior) on Jan 22, 2005 at 18:00 UTC
The first line of your sig can be safely reversed here: Examine who speaks, not what is said.	[reply]
Re: Explain a regex by hv (Prior) on Jan 22, 2005 at 15:56 UTC
I think most of the verbiage is there to keep the elements looking as similar to each other as possible, which is of dubious benefit. I'd guess that it's also trying to make it easy to modify it to extract any part of the number being matched by replacing the relevant `(?:` with `(`, which saves the programmer from having to find the place to insert the matching `)`. I can see no reason for the use of `{0,}` instead of `*` in one place. Hugo	[reply] [d/l] [select]