Hi folks. I'd like to ask for your help in something. While normally requests for help are posted to SOPW, I'm not looking for wisdom I'm looking for your regexes.

I need to build some test cases for a patch im working on for the core. I need to get together a body of regexes that use alternations in various ways. When i say alternations I mean things like /a|list|of|words/ not so much things like /(mad)*|[patterns]/. (BTW, this isnt to say that /(mad)*|[patterns]|a|list|of|words/ doesnt qualify, it does for sure.) So if you have regexes that might be suitable id really appreciate it if you could reply to this node with them. Real life regexes that might be built with things like join('|',@words) would be interesting. Especially if they are complicated.

Thanks a lot.

---
demerphq

Replies are listed 'Best First'.
Re: Need regexes with alternations in them for testing perl...
by Anonymous Monk on Mar 11, 2005 at 11:06 UTC
    use Regexp::Common; print $RE{URI}, $\;

      Im just curious, did you look at that regex before you suggested it or did you just guess that it might be appropriate?

      ---
      demerphq

        I saw it in a talk by Abigail during OSDC. Here's the slide. But I had to look again to make sure it had alternations.
Re: Need regexes with alternations in them for testing perl...
by olivierp (Hermit) on Mar 11, 2005 at 10:07 UTC
    I use this one daily, amongst a list of filters that are pulled in and qr/^$pattern/'ed
    (?!.*(?:customer relationship|enterprise resource planning|e-business| +int(?:ernationa)?l data management|systems, architecture|ww|is projec +ts))
    HTH
    --
    Olivier
Re: Need regexes with alternations in them for testing perl...
by kvale (Monsignor) on Mar 11, 2005 at 17:02 UTC
    Although this might not satisfy the real-world criterion, I would mine Regex::PreSuf for gnarly alternations. Previous to your work, it was the standard for regex alternation optimizing goodness. Here is fun code from word.t: For more a more real world flavor, substitute real, depunctuated tex for /usr/dict/words. This sort of test has the advantage of not only giving you mean measure of speedup, but allows you to easily compute a variance as well.

    -Mark

      Thanks. This was interesting. I ended up playing around with words.t and discovered a slight bug that isn't exposed by this test. The regex /foobar|foo/ is equivelent to /foo(?:bar)?/ the regex /foo|foobar/ is equivelent to the regex /foo(?:|bar)/. Regex::PreSuf along with similar modules like Regexp::Assemble and Regexp::List will generate /foo(?:bar)?/ for both. Eg:

      G:\cblead2\win32>perl -e "print $& if 'foobar'=~/foo|foobar/" foo G:\cblead2\win32>perl -e "print $& if 'foobar'=~/foo(?:|bar)/" foo G:\cblead2\win32>perl -e "print $& if 'foobar'=~/foobar|foo/" foobar G:\cblead2\win32>perl -e "print $& if 'foobar'=~/foo(?:bar)?/" foobar

      Unfortunately this makes them not entirely suitable for my needs. But it has proved interesting using them. :-)

      ---
      demerphq

Re: Need regexes with alternations in them for testing perl...
by ZlR (Chaplain) on Mar 17, 2005 at 10:34 UTC
    A request for real life regepx with alternations, well that's fun !

    Here's one that i use to do some matching on HP-UX iostat output :

    m/^ ( NANQ|INF|NANS| # Possible non numerics \d+ # a number ... (?:[,\.]\d+)? # ... with an optional decimal part ) # memorise the value of : tin . \s+ ( NANQ|INF|NANS| # The same one \d+ (?:[,\.]\d+)? ) # memorise the value of : tout . \s+ (?:[\d\s\.,]|NANQ|INF|NANS)+ # Rest of the line . $/x
    The non numeric values were added afterwards, after they showed in some iostat output were i did not expected them. The regep thus allowed me to fix an annoying behaviour with minimal impact.

    Another one :

    s/^\d\d:\d\d:\d\d\s*(?:CPU|INTR)\s*//
    This is some sar linux result line .

    Now some vmstat with sunOS:

    s/de\s+sr\s+(\w\d|\w\w|\w\w\d|--|\s)+in\s+sy/de sr in sy/g

    Cheers,
    zlr.

Re: Need regexes with alternations in them for testing perl...
by ysth (Canon) on Mar 14, 2005 at 06:43 UTC
    Not perl, but from the procmail manpage:
    If the regular expression contains ‘^TO_’ it will be substituted by ‘(^((Original-)?(Resent-)?(To |Cc |Bcc) |(X-Envelope |Apparently(-Resent)?)-To) :(.*[^-a-zA-Z0-9_.])?)’, which should catch all destination specifications containing a specific address.

    If the regular expression contains ‘^TO’ it will be substituted by ‘(^((Original-)?(Resent-)?(To |Cc |Bcc) |(X-Envelope |Apparently(-Resent)?)-To) :(.*[^a-zA-Z])?)’, which should catch all destination specifications containing a specific word.

    If the regular expression contains ‘^FROM_DAEMON’ it will be substituted by ‘(^(Mailing-List : |Precedence :.*(junk |bulk |list) |To : Multiple recipients of |(((Resent-)?(From |Sender) |X-Envelope-From) : |>?From )([^>]*[^(.%@a-z0-9])?(Post(ma?(st(e?r)? |n) |office) |(send)?Mail(er)? |daemon |m(mdf |ajordomo) |n?uucp |LIST(SERV |proc) |NETSERV |o(wner |ps) |r(e(quest |sponse) |oot) |b(ounce |bs\.smtp) |echo |mirror |s(erv(ices? |er) |mtp(error)? |ystem) |A(dmin(istrator)? |MMGR |utoanswer))(([^).! :a-z0-9][-_a-z0-9]*)?[%@>\t ][^<)]*(\(.*\).*)?)?$([^>] |$)))’, which should catch mails coming from most daemons (how’s that for a regular expression :-).

    If the regular expression contains ‘^FROM_MAILER’ it will be substituted by ‘(^(((Resent-)?(From |Sender) |X-Envelope-From) : |>?From )([^>]*[^(.%@a-z0-9])?(Post(ma(st(er)? |n) |office) |(send)?Mail(er)? |daemon |mmdf |n?uucp |ops |r(esponse |oot) |(bbs\.)?smtp(error)? |s(erv(ices? |er) |ystem) |A(dmin(istrator)? |MMGR))(([^).! :a-z0-9][-_a-z0-9]*)?[%@>\t ][^<)]*(\(.*\).*)?)?$([^>] |$))’ (a stripped down version of ‘^FROM_DAEMON’), which should catch mails coming from most mailer-daemons.