in reply to The Dream of a New, Faster Regular Expression Engine

I've no idea what you mean by "to match hundreds of regular expressions against one text in O(lg(x))". What is x? It surely isn't the length of the text, or the length or number of matches you want to perform, as you will need at least linear time to read in the text and the matches. The abstract you link to doesn't mention this time either.

From what I understand of the abstract, this is a highly specialized technique (and hence, probably not useful as a replacement for Perl regular expressions). Also, the abstract suggests fast searches will be able to be done after doing (a lot of) preprocessing work. That is, you are repeatedly searching the same text. This is not typical for Perl programs. (It's also what "study" was intended to be for).

Seems to me that maybe its time to start thinking about a new Regular Expression Engine for Perl, maybe as an add-on extension, so we don't need to break support to old applications.

Are you volunteering? I think p5p badly needs another person to understand the regexp-engine. The current number of people who have a deep knowledge of the engine can be counted on the fingers of one hand. Patches welcome.

  • Comment on Re: The Dream of a New, Faster Regular Expression Engine

Replies are listed 'Best First'.
Re^2: The Dream of a New, Faster Regular Expression Engine
by monsieur_champs (Curate) on Oct 05, 2004 at 15:39 UTC

    In pieces:

    1. "X" is the lenght of the text (when matching one text against more than one regular expression). Read the full paper and you'll understand.
    2. This specific engine still very specialized. But the technique used on it is more general than the Perl's current implementation and innovative, and deserves a deeper study, and maybe even a implementation.
    3. I'm thinking seriously on volunteering. I believe that such a work could help me achieve a MSc degree (and this is very desirable) while still contributing to the Perl Community. I'm looking for someone with time and patience enough to guide me. Maybe you're interested.
    4. Send a patch to the Perl source code stills on my computer dreams list. :-) I'll send a good one someday.

      Don't think about volunteering. Do it! I'll make my first request: make the regex engine re-entrant. It was supposed to be, but it's not. This is a serious limitation that admittedly doesn't hit many people, but when it does, it's a killer. I've toyed with the idea of using Inline::Ruby for an application because this is one of the things that Ruby's regex engine can do that Perl's cannot.

      We desperately need volunteers who have the skills to pull this off. I certainly don't, otherwise I would have done it :)

      Cheers,
      Ovid

      New address of my CGI Course.

        Dear ovid
        I'm just a poor mortal that thinks big. And this doesn't qualify me (at least not in a near future) with the skills needed to make this change. I'll need a powerful Perl Master that is interested on this problem, and a big chunk of time to study and develop this.

        I have the interest, but lack the skills and/or a master to guide me in this journey.

        I'm really sorry about this.

      I'd like to read the full paper, but I didn't see a link to the full paper from the link you provided. Perhaps I missed it. Could you post a link to the full paper?

      As for doing the match in 'O(lg(X))', that surely is after some preprocessing - as otherwise it would be impossible, you'd need O(X) time to read in the text to match against. The abstract doesn't mention the preprocessing time, and for typical matches in a Perl program, this is significant, as most texts (strings) will be matched against only once.

Re^2: The Dream of a New, Faster Regular Expression Engine
by DrHyde (Prior) on Oct 08, 2004 at 09:08 UTC
    On the fingers of one hand? That's better than I thought then, I thought it was the fingers of one foot.