I was wondering if there was any reason other than "that's the way it is"?
I can give you a few, hopefully informed guesses.

In your example, a backreference in a character class would seem to make sense, because you just matched one character. But what about longer strings? If your first group matched the string "a-z", would a character class with a backreference [\1] then have to match all lower case letters? Normal backreferences don't match as a regex, instead, they're substrings, and try to literally match what they're overlayed against.

What if your pattern matched just a single backslash, surely you'd end up with an invalid regex? Or would you instead prefer, that this would match "a", "-" and "z" only?

In any case, clearly, you'd need to have instant regex compilation, per attempt of a match. That isn't very fast. But it gets worse.

A character class can typically be implemented using a bitmap (or bit array), with single byte characters, that's 256 bits. To compile a character class, you just mark all the characters that are acceptable. To match using such a character class, just check to see if this character's bit is set in its bit array.

This also would seem to indicate that compiling a character class likely won't be the fastest part in a regex compiler. It's pretty obvious that a test using such a character class would be a lot faster, than the compilation. Just a tip to compare apples and oranges.


In reply to Re: Re: Re: Regex backreference problem. by bart
in thread Regex backreference problem. by BrowserUk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.