G'day Hauke,

++ This looks like a very good start; seems reasonably complete; and covers most of the points I might have made. I have comments on two areas, as follows.

With any sort of tutorial, those reading it — to learn about the subject, rather than for reviewing, proof-reading, etc. — probably start with limited knowledge. Accordingly, any terms used should be unambiguous; unfortunately, you've used $regex to mean two different things:

$regex = join ... $regex = qr/...

I'm familiar with both the subject matter and the technique, so this posed no problem for me; however, for someone learning this, it may do. While it's reasonably obvious in the short code example, half a page later, in the middle of descriptive text, the appearance of $regex might not be as obvious to the student as it is to you or I. Consider renaming those; purely as a suggestion:

$regex_base_str = join ... $regex_compiled = qr/...

In points (4) & (5), in the first list, you show grouping. To resolve the same issue in both, you use explicit capture grouping in (4), and implicit non-capture grouping in (5).

Regex pieces used for alternation often occur as part of a larger regex; in fact, I suspect that's the more usual case. This may be as simple as the anchor assertions you show in (4), or could be a lot more complex. I'd suggest adding explicit non-capturing grouping to $regex_base_str (or whatever you call it) as part of the normal technique. To demonstrate:

# Simple case: OK - matches "a" or "b" $ perl -E 'my $re = "a|b"; $re = qr{$re}; say $re' (?^u:a|b) # Complex case: NOT OK - matches "Xa" or "bY" $ perl -E 'my $re = "a|b"; $re = qr{X${re}Y}; say $re' (?^u:Xa|bY) # Complex case: OK - matches "a" or "b" [fixed with "(?:...)"] $ perl -E 'my $re = "(?:a|b)"; $re = qr{X${re}Y}; say $re' (?^u:X(?:a|b)Y)

— Ken


In reply to Re: [RFC] Building Regex Alternations Dynamically by kcott
in thread Building Regex Alternations Dynamically by haukex

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.