in reply to Re: Simplifying regexes
in thread Simplifying regexes

In the tools that I now work with that have been developed over many years, there are regexes that can be rather abstruse. Going through them with a /x can help. I have run across some regexes that are either recursive or part of a circular definition. I am looking for insight on how to refactor them into non self referential forms so that they can work, and am hoping to become more enlightened on Regexes. I never had courses on these kind of topics, so I have a huge void to fill. I have a lot of the practical down. Now I would like to get a bit of the theoretical.

Replies are listed 'Best First'.
Re^3: Simplifying regexes
by BrowserUk (Patriarch) on Oct 26, 2015 at 16:17 UTC

    Warning: I'm also very light on the theory of NFA's and DFA's.

    Perl was my first language (of then (13 years ago) ~10 languages I'd used in earnest) that provided "regex". I found them really hard to get to grips with.

    By about 8 or 9 years ago I'd played with them enough that I could do most things I wanted to do with them; though I often found myself needing to experiment with them to arrive at a "solution" rather than being able to plan that solution.

    It was around that time that I read Friedl's book. And, at the risk of incurring even more wrath than normal, I'm going to say that: it bored me to tears. Immense; thorough; an opus of extraordinary accomplishment that it is; the one word I cannot use in praise of it is 'enlightening'.

    I tried applying what I thought I had learnt from reading it, to my subsequent attempts at regex and found that I still had to 'suck it and see', when it came to solving real world problems.

    SO then I went off to Wikipedia and read the stuff on NFA's and DFA's; and followed the links from those pages; and read (scanned) a bunch of theses, papers and academic journals on the subjects and ...;

    I was still none the wiser.

    What I did learn was that Perl (compatible) regex do not comply with the academic descriptions or rules for either type of automaton. It shortcuts the rules for DFA's willy-nilly and defies categorisation amongst the (formal) NFA descriptions. In short: Perl's regular expressions are anything but regular; and (IMO) formal regex theory has little or nothing to teach you if you are already accomplished at using them practically.

    (My)bottom line is that when to comes to optimising Perl's regex, there is only one way to proceed: Benchmark! If you can conceive of two (or more) ways to achieve a particular goal; time'em and opt for the quicker; because all the reading in the world of formal regular expression theory will not help one iota.

    Good luck with your research.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Not having attained the wisdom to say with certainty yea or nay, these sound like words of wisdom learned by following a path similar to mine. These words resonate within me, and I suspect I will shortly arrive at the same conclusion. I was hoping that I was not the only one who found Friedl difficult. These are truly the types of words I hope to hear. Thank you.