Re: Regular Expression Question

Replies are listed 'Best First'.
Re: Re: Regular Expression Question by nevyn (Monk) on Dec 04, 2003 at 20:48 UTC
you should test your regexp on ",foo," which will work, and shouldn't, as will "!!foo!!,". So for a working single regexp you want (assuming \w is good enough)... `/^\w+ # Starts with a "word" (?:,\w+)*$/x; # Followed by many "comma and word" atoms` [download] Which IMO is ugglier than... `/^[\w,]+$]/ && # Comma word atoms ! /^,\|,,\|,$/; # With constraints on commas` [download] It's probably faster to use 2 regexps too -- James Antill	[reply] [d/l] [select]
Re: Re: Re: Regular Expression Question by simonm (Vicar) on Dec 04, 2003 at 20:59 UTC
you should test your regexp on ",foo," which will work, and shouldn't, as will "!!foo!!,". It depends on what you think "should work". The OP's original regex was not anchored, and seemed intended to extract matching substrings rather than confirm that an entire string matched. The `/(\w+(?:\,\w+))/` regex will successfully extract the matching "foo" substring from your two sample cases into `$1`. If you want to check the entire string, then yes, leave out the parenthesis and use `^...$` anchors. Update: with regard to It's probably faster to use 2 regexps too*: Yes, a quick Benchmarking shows that, with anchoring, the double-regex style runs about 50% faster than the single-regex solution I posted. (Perhaps one of the resident RegEx gurus can explain why this is?) However, if you want to extract matching substrings, I think the single regex is a sensible approach.	[reply] [d/l] [select]
Re: Re: Re: Re: Regular Expression Question by nevyn (Monk) on Dec 04, 2003 at 21:45 UTC
Update: with regard to It's probably faster to use 2 regexps too: Yes, a quick Benchmarking shows that, with anchoring, the double-regex style runs about 50% faster than the single-regex solution I posted. (Perhaps one of the resident RegEx gurus can explain why this is?) Generally anything that looks like "(AB)" is bad for the backtracking. -- James Antill	[reply]
Re: Regular Expression Question (show me the can^H^H^Hbenchmark) by Abigail-II (Bishop) on Dec 05, 2003 at 15:09 UTC
Update: with regard to It's probably faster to use 2 regexps too: Yes, a quick Benchmarking shows that, with anchoring, the double-regex style runs about 50% faster than the single-regex solution I posted. (Perhaps one of the resident RegEx gurus can explain why this is?) I'd be interested to see your benchmark (code + data), as I don't come to the same conclusion. The benchmark below shows the one regex solution to be somewhat faster - the data sample is tiny though. `#!/usr/bin/perl use strict; use warnings; use Benchmark qw /timethese cmpthese/; chomp (our @lines = <DATA>); our (@r1, @r2); cmpthese -10 => { one => '@r1 = map {/^\w+(?:,\w+)*$/ ? 1 : 0} @lines +', two => '@r2 = map {/^[\w,]+$/ && !/^,\|,,\|,$/ ? 1 : 0} @lines +', }; die "Unequal" unless "@r1" eq "@r2"; __DATA__ one,two,three,four,five ,one,two,three,four,five one,two,three,four,five, one,two,three,,four,five one,two,three four,five Rate two one two 25436/s -- -26% one 34417/s 35% --` [download] Abigail	[reply] [d/l]
Re: Re: Regular Expression Question (show me the can^H^H^Hbenchmark) by simonm (Vicar) on Dec 05, 2003 at 21:56 UTC
Re: Re: Regular Expression Question by hardburn (Abbot) on Dec 04, 2003 at 21:40 UTC
`\w` will also match the underscore, which the orginal poster does not specifically require, and so should not be included. ---- I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident. -- Schemer `: () { :\|:& };:` Note: All code is untested, unless otherwise stated	[reply] [d/l] [select]
Re: Re: Regular Expression Question by TASdvlper (Monk) on Dec 05, 2003 at 14:56 UTC
Maybe I'm missing something here ... is the "?:" actually necessary ? I read it in the Camel book and from their example I can't see the relevance of it for this regexp. Can someone explain this to me ?	[reply]