in reply to Why aren't perl regular expressions really regular expressions?

To expound on the answers you've already received -- the word "regular" in "regular expression" comes from regular languages, a certain kind of formal language (as described by formal grammars).

There are a number of different kinds of these languages that arise fairly naturally; the most important ones are collected in the so-called Chomsky hierarchy. They're characterized by increasingly strict conditions imposed on their grammars' production rules; the most general kind of language (recursively enumerable) has no restrictions, context-sensitive and context-free languages have some, regular languages have the most stringent. (The most stringent in this hierarchy, that is: there are languages that would rank below regular languages still, e.g. star-free languages.)

Interestingly, all these languages are computed by certain kinds of formal machines, from Turing machines (or equivalent notions of computation, e.g. μ-recursive functions) for recursively enumerable languages to finite automata (deterministic or non-deterministic, it makes no difference) for regular languages.

Regular expressions, in the theoretical sense (rather than Perl's), directly describe finite automata. The only things you need there are concenation, alternation ((...|...) in Perl) and the Kleene star ((...)* in Perl).

This is what Perl's regular expressions are based on, but in practice, when you're more interested in solving problems than researching formal languages (as fascinating a subject as they are), you'll need more tools in your toolbox, and Perl gained a lot of tools to make your job easier, from references to look-around assertions to non-backtracking subpatterns to pattern interpolation to match-time code evaluation to... well, you get the idea.

That's the sense in which Perl's regular expression are, in fact, irregular -- or, perhaps equivalently, "Perl-compatible". They go far beyond what "regular expressions" originally were, and as a result they're vastly more useful in practice.

(On that note, BTW, it's also worth adding that when you encounter "Perl-compatible" regular expressions in any other language, it's pretty much a safe bet that they aren't in fact.)

  • Comment on Re: Why aren't perl regular expressions really regular expressions?

Replies are listed 'Best First'.
Re^2: Why aren't perl regular expressions really regular expressions?
by LanX (Saint) on Mar 17, 2015 at 15:28 UTC
    > BTW, it's also worth adding that when you encounter "Perl-compatible" regular expressions in any other language, it's pretty much a safe bet that they aren't in fact.

    AFAIK are tools and languages claiming "Perl-compatible" regular expression based on the same same PCRE library and are hence compatible to each other.

    The problem is rather that PCRE isn't fully compatible to Perl, they rather try to mimic the incompatibilities in syntax to POSIX RegExes Gnu Basic RegExes and some added features Larry included into Perl.

    I.o.W the syntax looks compatible but the results may not in edge cases.

    edit

    recommended read Friedl's "Mastering Regular Expression" from O'Reilly.

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)

    PS: Je suis Charlie!

      I.o.W the syntax looks compatible but the results may not in edge cases.

      Exactly. The basic features are probably in place - I say "probably", as I've not actually looked at the pcre library -, but not all of the advanced features will be. At the very least I would expect that match-time code evaluation isn't; if you have a regular expression using that in a Perl program, I'd bet you will not be able to copy it unmodified to a script written in another language and having it Just Work™.

      It's not gonna matter most of the time (famous last words, those), but one should still keep in mind that "Perl-compatible regular expressions" are really "almost-but-not-quite-Perl-compatible regular expressions".

      (The fact that different languages all using the pcre library are gonna have compatible regular expressions among each other is added irony.)

        Hm ... what I meant is even with same syntax, i.e. no compilation error, you might get different results.

        IIRC there are differences in backtracking with nested groups and the notion of what an atomar match is (no guaranty PCRE not really my domain)

        JS claims to copy the Perl 4 standard, but even in this limited case you'll be confronted with incompatibilities.

        Cheers Rolf
        (addicted to the Perl Programming Language and ☆☆☆☆ :)

        PS: Je suis Charlie!

        It's worth bearing in mind how far away from Perl's regex power other languages borrowing PCRE are, for evangelism purposes. Not only is the Perl syntax for running regexen without equal in its simplicity; the true Perl regex language and toolbox itself is beyond compare with other languages, and not (of course! - though as I had previously thought) a sort of removable engine that has been taken out and bolted more clumsily into other languages.

        Regexen were one of the ma(in|ny) reasons why I decided to become a Perl programmer, after I found out about them in Learning Perl.

        I wonder if the Perl 6 regex engine will be any easier to borrow?
      recommended read Friedl's "Mastering Regular Expression" from O'Reilly.
      Yes, I'm reading the first edition of Friedl's famous book (as well as about ten other Perl / related books all at once).

      I specifially opened Mastering Regular Expressions looking for the answer to my above question, but couldn't find it.
        I just checked the history part in chapter 3 of the third edition.

        This information is at best hidden, there is a reference to a publication titled "The role of finite automata... ", that's all I could find.

        But I don't know any better source explaining the differences between dialects.

        Cheers Rolf
        (addicted to the Perl Programming Language and ☆☆☆☆ :)

        PS: Je suis Charlie!