Fellows
A friend of mine pointed me to a paper published by the ACM that describes a new kind of regular expression engine, that, as far as I understand, is able to match (by the aproximate match technique) hundreds of regular expressions against one text in O( ln( x ) ) (!!!!!). If this is true, I guess we will need to reimplement (or, at least, review) our regular expressions engine.
I'm learning about this and don't know a lot about Approximated Matching, but it seems to be a generalization of the Exact Pattern Match problem (that's Perl's current regular expression engine technique).
I guess this kind of pattern matching is a subproduct from the biology efforts on the Genoma Project and other genetic sequencing projects arround the world, and that this have enough maturity to guarantee a new implementation of a regular expression engine to Perl.
I also googled for and found a handful of papers on the subject of multiple string matching by Gonzalo Navarro, a researcher at the Computer Science Department, University of Chile.
Seems to me that maybe its time to start thinking about a new Regular Expression Engine for Perl, maybe as an add-on extension, so we don't need to break support to old applications. This can help Perl to solve problems faster and simpler, allowing us to solve more with even less code.
Re: The Dream of a New, Faster Regular Expression Engine
by diotalevi (Canon) on Oct 05, 2004 at 15:42 UTC
|
You may want to take a look at Regexp::Approx - Use fuzzy regular expressions. It embeds String::Approx function calls into the standard regular expression engine. It would be valuable to have a proper approximate matching engine. So please, don't let this code lull you into thinking that the more general solution wouldn't be immensely preferred.
| [reply] |
Re: The Dream of a New, Faster Regular Expression Engine
by hardburn (Abbot) on Oct 05, 2004 at 14:46 UTC
|
If you want to build this into a module, you might want to read MJD's How Regexes Work. I hope you have some pennies around.
"There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni.
| [reply] |
|
I'm thinking seriously about this, for my MSc degree. But this stills under evaluation. I'm talking about this with an old friend and graduation teacher. Maybe this ends on a nice and good project.
| [reply] |
Re: The Dream of a New, Faster Regular Expression Engine
by Anonymous Monk on Oct 05, 2004 at 14:51 UTC
|
I've no idea what you mean by "to match hundreds of regular expressions against one text in O(lg(x))". What is x? It surely isn't the length of the text, or the length or number of matches you want to perform, as you will need at least linear time to read in the text and the matches. The abstract you link to doesn't mention this time either.
From what I understand of the abstract, this is a highly specialized technique (and hence, probably not useful as a replacement for Perl regular expressions). Also, the abstract suggests fast searches will be able to be done after doing (a lot of) preprocessing work. That is, you are repeatedly searching the same text. This is not typical for Perl programs. (It's also what "study" was intended to be for).
Seems to me that maybe its time to start thinking about a new Regular Expression Engine for Perl, maybe as an add-on extension, so we don't need to break support to old applications.
Are you volunteering? I think p5p badly needs another person to understand the regexp-engine. The current number of people who have a deep knowledge of the engine can be counted on the fingers of one hand. Patches welcome. | [reply] |
|
In pieces:
- "X" is the lenght of the text (when matching one text against more than one regular expression). Read the full paper and you'll understand.
- This specific engine still very specialized. But the technique used on it is more general than the Perl's current implementation and innovative, and deserves a deeper study, and maybe even a implementation.
- I'm thinking seriously on volunteering. I believe that such a work could help me achieve a MSc degree (and this is very desirable) while still contributing to the Perl Community. I'm looking for someone with time and patience enough to guide me. Maybe you're interested.
- Send a patch to the Perl source code stills on my computer dreams list. :-) I'll send a good one someday.
| [reply] |
|
Don't think about volunteering. Do it! I'll make my first request: make the regex engine re-entrant. It was supposed to be, but it's not. This is a serious limitation that admittedly doesn't hit many people, but when it does, it's a killer. I've toyed with the idea of using Inline::Ruby for an application because this is one of the things that Ruby's regex engine can do that Perl's cannot.
We desperately need volunteers who have the skills to pull this off. I certainly don't, otherwise I would have done it :)
| [reply] |
|
|
I'd like to read the full paper, but I didn't see a link to the full paper from the link you provided. Perhaps I missed it. Could you post a link to the full paper?
As for doing the match in 'O(lg(X))', that surely is after some preprocessing - as otherwise it would be impossible, you'd need O(X) time to read in the text to match against. The abstract doesn't mention the preprocessing time, and for typical matches in a Perl program, this is significant, as most texts (strings) will be matched against only once.
| [reply] |
|
|
|
|
On the fingers of one hand? That's better than I thought then, I thought it was the fingers of one foot.
| [reply] |
Re: A New, Faster Regular Expression Engine? the devil is in the details.
by wufnik (Friar) on Oct 11, 2004 at 23:38 UTC
|
| [reply] |
|
|