in reply to Re^2: Regex libraries
in thread Regex libraries

According to the Boost.Regex Regular Expression Syntax, all the constructs you checked are supported. From Boost.Regex Standards Conformance, we learn that the unsupported Perl features are:
  1. \N{name} - but one can use [:name:]
  2. \pP and \PP
  3. (?imsx-imsx)
  4. Lookbehind
  5. (?{ }) and (??{ })
  6. (?(condition)yes-pattern) and (?(condition)yes-pattern|no-pattern)
That in my opinion isn't too bad. For point 1, there is an alternative, so no loss of functionality. Also note that one reason \N isn't supported is that \N isn't a regex construct - it's a string interpolation thing. PCRE doesn't support \N either.

I've written a lot of regexes, and seen even more, but I've never had the need to use the constructs of point 2, and I've never seen them used either. In PCRE support for \p and \P is limited, and only available if specially build with Unicode character property support. (PCRE does not have full UTF-8 support).

Point 3 might be a nuisance, but personally I've never used them to set flags for parts of the expressions - I only use them implicitely when interpolating a qr construct - a feature that can not be handled by a library. And with boost, you can set many flags when matching, even more flags than with Perl. Specifically, one can set flags that mimic /m and /s. (Or rather, one needs to set flags to turn the /m and /s behaviours off, if I understand the page correctly). /i can be achieved by first lowercasing what you are matching against. There doesn't seem to be an equivalent to /x, but we were happy without it for years in Perl as well, and /x doesn't provide functionality - just readability.

Lookbehind in Perl regexes is fairly limited anyway, as you can only match against fixed width strings. Sure, it's a miss, but it's a miss of a limited thing.

Not being able to execute code isn't a limitation of the library - it's a limitation because it's a library. Only if regexes are an integral part of the language is such a thing possible, as it requires access to the variables. PCRE doesn't support (?{...}) and (??{...}) either, although it does have some other features that might do what you want to do with those coding constructs.

The last point is a miss because you can't use (?(?{...})yes|no), but then you are using code again, and that wouldn't be possible anyway due to the previous argument.

Note also that the last two points are still marked as highly experimental, and p5p reserves the right to remove or change them without any notice.

Note that I base this purely on what Boost and PCRE say about themselves on their web and manual pages. I've never used any of the libraries myself.

Replies are listed 'Best First'.
Re^4: Regex libraries
by Schuk (Pilgrim) on Dec 30, 2004 at 17:26 UTC

    Thanks

    That is very interesting to know. Its good to know that presumably it wonīt affect me. Although reading the documentation I couldnīt get much out of it as I wasnt understanding the regexes itself.
    Thanks for translating these to me in an understandable language :-)

    While testing our software I ran into the mentioned bugs and falsely blamed Boost for that.

    Its also interesting to see that Boost and PCRE have quite much in common. The author of Boost made also a statement, why he didnt stick to Perl5 regexes. He based his position on an article by Larry Wall, where Larry states that perl6 will probably have quite huge changes in the regex syntax and so Perl5 hasnt in fact a real standart. Sounds to me like Perl6 is reinventing regex!?

    I commit I havent read the article completly yet but do you guys know if there /will be|is/ a book about Perl6 Regex syntax?

      The author of Boost is quite right. Perl doesn't have a standard, and regexes in Perl6 will be different (although there's a promise Perl5 style regexes will work as well).

      Read about the Perl6 regexes in Apocalypse 5: Pattern Matching.

      There's no book about Perl6 regexes yet - but no doubt O'Reilly or some other publisher is willing to publish one as soon as someone can convince them they could write a good book about them. (Perl publishers will be very happy with Perl6 - it gives them the opportunity to sell their books again).