Need regexes with alternations in them for testing perl...

Replies are listed 'Best First'.
Re: Need regexes with alternations in them for testing perl... by Anonymous Monk on Mar 11, 2005 at 11:06 UTC
`use Regexp::Common; print $RE{URI}, $\;` [download]	[reply] [d/l]
Re^2: Need regexes with alternations in them for testing perl... by demerphq (Chancellor) on Mar 11, 2005 at 12:14 UTC
Im just curious, did you look at that regex before you suggested it or did you just guess that it might be appropriate? --- demerphq	[reply]
Re^3: Need regexes with alternations in them for testing perl... by Anonymous Monk on Mar 11, 2005 at 12:48 UTC
I saw it in a talk by Abigail during OSDC. Here's the slide. But I had to look again to make sure it had alternations.	[reply]
Re: Need regexes with alternations in them for testing perl... by olivierp (Hermit) on Mar 11, 2005 at 10:07 UTC
I use this one daily, amongst a list of filters that are pulled in and `qr/^$pattern/`'ed `(?!.*(?:customer relationship\|enterprise resource planning\|e-business\| +int(?:ernationa)?l data management\|systems, architecture\|ww\|is projec +ts))` [download] HTH -- Olivier	[reply] [d/l] [select]
Re: Need regexes with alternations in them for testing perl... by kvale (Monsignor) on Mar 11, 2005 at 17:02 UTC
Although this might not satisfy the real-world criterion, I would mine Regex::PreSuf for gnarly alternations. Previous to your work, it was the standard for regex alternation optimizing goodness. Here is fun code from word.t: Read more... (6 kB) For more a more real world flavor, substitute real, depunctuated tex for /usr/dict/words. This sort of test has the advantage of not only giving you mean measure of speedup, but allows you to easily compute a variance as well. -Mark	[reply] [d/l]
Re^2: Need regexes with alternations in them for testing perl... by demerphq (Chancellor) on Mar 12, 2005 at 16:18 UTC
Thanks. This was interesting. I ended up playing around with words.t and discovered a slight bug that isn't exposed by this test. The regex /foobar\|foo/ is equivelent to /foo(?:bar)?/ the regex /foo\|foobar/ is equivelent to the regex /foo(?:\|bar)/. Regex::PreSuf along with similar modules like Regexp::Assemble and Regexp::List will generate /foo(?:bar)?/ for both. Eg: `G:\cblead2\win32>perl -e "print $& if 'foobar'=~/foo\|foobar/" foo G:\cblead2\win32>perl -e "print $& if 'foobar'=~/foo(?:\|bar)/" foo G:\cblead2\win32>perl -e "print $& if 'foobar'=~/foobar\|foo/" foobar G:\cblead2\win32>perl -e "print $& if 'foobar'=~/foo(?:bar)?/" foobar` [download] Unfortunately this makes them not entirely suitable for my needs. But it has proved interesting using them. :-) Read more... warning wide code (5 kB) --- demerphq	[reply] [d/l] [select]
Re: Need regexes with alternations in them for testing perl... by ZlR (Chaplain) on Mar 17, 2005 at 10:34 UTC
A request for real life regepx with alternations, well that's fun ! Here's one that i use to do some matching on HP-UX iostat output : `m/^ ( NANQ\|INF\|NANS\| # Possible non numerics \d+ # a number ... (?:[,\.]\d+)? # ... with an optional decimal part ) # memorise the value of : tin . \s+ ( NANQ\|INF\|NANS\| # The same one \d+ (?:[,\.]\d+)? ) # memorise the value of : tout . \s+ (?:[\d\s\.,]\|NANQ\|INF\|NANS)+ # Rest of the line . $/x` [download] The non numeric values were added afterwards, after they showed in some iostat output were i did not expected them. The regep thus allowed me to fix an annoying behaviour with minimal impact. Another one : `s/^\d\d:\d\d:\d\d\s(?:CPU\|INTR)\s//` [download] This is some sar linux result line . Now some vmstat with sunOS: `s/de\s+sr\s+(\w\d\|\w\w\|\w\w\d\|--\|\s)+in\s+sy/de sr in sy/g` [download] Cheers, zlr.	[reply] [d/l] [select]
Re: Need regexes with alternations in them for testing perl... by ysth (Canon) on Mar 14, 2005 at 06:43 UTC
Not perl, but from the procmail manpage: If the regular expression contains `‘^TO_’` it will be substituted by `‘(^((Original-)?(Resent-)?(To \|Cc \|Bcc) \|(X-Envelope \|Apparently(-Resent)?)-To) :(.[^-a-zA-Z0-9_.])?)’`, which should catch all destination specifications containing a specific address. If the regular expression contains `‘^TO’` it will be substituted by `‘(^((Original-)?(Resent-)?(To \|Cc \|Bcc) \|(X-Envelope \|Apparently(-Resent)?)-To) :(.[^a-zA-Z])?)’`, which should catch all destination specifications containing a specific word. If the regular expression contains `‘^FROM_DAEMON’` it will be substituted by `‘(^(Mailing-List : \|Precedence :.(junk \|bulk \|list) \|To : Multiple recipients of \|(((Resent-)?(From \|Sender) \|X-Envelope-From) : \|>?From )([^>][^(.%@a-z0-9])?(Post(ma?(st(e?r)? \|n) \|office) \|(send)?Mail(er)? \|daemon \|m(mdf \|ajordomo) \|n?uucp \|LIST(SERV \|proc) \|NETSERV \|o(wner \|ps) \|r(e(quest \|sponse) \|oot) \|b(ounce \|bs\.smtp) \|echo \|mirror \|s(erv(ices? \|er) \|mtp(error)? \|ystem) \|A(dmin(istrator)? \|MMGR \|utoanswer))(([^).! :a-z0-9][-_a-z0-9])?[%@>\t ][^<)]($.$.)?)?$([^>] \|$)))’`, which should catch mails coming from most daemons (how’s that for a regular expression :-). If the regular expression contains `‘^FROM_MAILER’` it will be substituted by `‘(^(((Resent-)?(From \|Sender) \|X-Envelope-From) : \|>?From )([^>][^(.%@a-z0-9])?(Post(ma(st(er)? \|n) \|office) \|(send)?Mail(er)? \|daemon \|mmdf \|n?uucp \|ops \|r(esponse \|oot) \|(bbs\.)?smtp(error)? \|s(erv(ices? \|er) \|ystem) \|A(dmin(istrator)? \|MMGR))(([^).! :a-z0-9][-_a-z0-9])?[%@>\t ][^<)]($.$.*)?)?$([^>] \|$))’` (a stripped down version of `‘^FROM_DAEMON’`), which should catch mails coming from most mailer-daemons.	[reply] [d/l] [select]