I was looking for insight yesterday and I found myself beckoned to the Pattern Matching section of Chapter 2 of the good book (Camel 2E). It was not a matter of needing a refresher of the regex, but simply a matter that sometimes you take solace in something very familiar to prepare yourself for a daunting task.

It was then that the words hit me, "The Perl Engine uses a nondeterministic finite state automaton to find a match. ... If you're cagey, you can write efficient patterns that don't do a lot of silly backtracking."

At that moment I had a "Eureka!" I remembered from my own studies that finite state automata have mathematical rules that allow you to work with them in a somewhat algebraic manner. And during those studies, we discussed that there exists software whose purpose is to reduce these automata to their optimal expressions. I think you may understand everything now.

What I am wondering is simple: Does a regular expression optimizer exist currently? There is no doubt in my mind that if it doesn't, that it is entirely possible to write one. But that is not of importance to me at this moment. If none exist, then I may embark on that journey in the future.

What is acceptable of such a device that spits out optimized regexes? Obviously, I would like to get the best regex possible no matter if I put in a horrible regex or a better version of the same horrible regex. (ie - r1 => ro and r2 => ro if r1 <s>~</s> r2) I don't think that time constraints are entirely important. Of course, I don't want to wait five years for an optimal regex, but if I give it a crappy regex, I expect that I'll have to wait longer than if I gave it a better version of the regex. In other words, it solves your problems based on the step you start it on.

I guess that covers the two basic requirements I would have for a regex optimizer.

ALL HAIL BRAK!!!


In reply to Regular Expression Optimizer by PsychoSpunk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.