Regular Expression Question

TASdvlper has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Regular Expression Question by !1 (Hermit) on Dec 04, 2003 at 19:33 UTC
It would probably be far easier to first verify that the string contains legitimate characters and then check whether or not any of the conditions that would make it illegal exist. `#!/usr/bin/perl -wl use strict; for (<DATA>) { chomp; if (! /[^a-zA-Z0-9,]/ and ! /^,\|,{2}\|,$/) { print "Yep: $_"; } else { print "Nope: $_"; } } __DATA__ this,should,work,11 ,this,should,not,work this,,too,should,not,work nor,this,one,` [download] Please please please read perldoc perlretut and perldoc perlre for a deeper understanding of regular expressions.	[reply] [d/l]
Re: Re: Regular Expression Question by TASdvlper (Monk) on Dec 04, 2003 at 20:12 UTC
Thanks for your help. Why is there a "!" in front of the 1st regexp ? I understand the 2nd.	[reply]
Re: Re: Re: Regular Expression Question by nevyn (Monk) on Dec 04, 2003 at 20:25 UTC
The character class is negated, note where the ^ is... `! /[^a-zA-Z0-9,]/` [download] ...however I'm not sure the double negative is faster enough to warrant not just doing the more obvious... `/^[a-zA-Z0-9,]+$/` [download] ...it's also not stated whether /^$/ is valid or not, which is different for the above two. -- James Antill	[reply] [d/l] [select]
Re: Regular Expression Question by BrowserUk (Patriarch) on Dec 04, 2003 at 19:35 UTC
It's probably simpler and clearer (as well as more efficient) to use two regexes for this type of thing. `for ('fred,bill', 'fred,,bill', ',fred', 'bill,') { print "Bad:'$_'" unless $_ =~ m[^[a-zA-Z0-9,]+$] and $_ !~ m[,{2}\|,$\|^,]; } Bad:'fred,,bill' Bad:',fred' Bad:'bill,'` [download] Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail Hooray! Wanted!	[reply] [d/l]
Re: Regular Expression Question by kesterkester (Hermit) on Dec 04, 2003 at 19:36 UTC
If multiple regexes are acceptable, this'll get you a little closer to what you want, I think. The first regex in the if statement matches the alphanumerics and commas, but not leading/trailing commas; the second regex excludes consecutive commas. `use warnings; use strict; while ( <DATA> ) { print "$1\n" if /^(\w+[\w,]+\w+)$/ && !/,,/; } __DATA__ !@#$as3dfa ,sdfas3df, asd3fsa,,a3sdf as3df,asdf3,3asdf,asd3f sad3fasdjasdfkasdfklas3jf 3sad3fasdjasdfkasdfklas3jf 3sad3fasdjasdfkasdfklas3jf3` [download] Output is: `as3dfa sdfas3df as3df,asdf3,3asdf,asd3f sad3fasdjasdfkasdfklas3jf 3sad3fasdjasdfkasdfklas3jf 3sad3fasdjasdfkasdfklas3jf3` [download]	[reply] [d/l] [select]
Re: Re: Regular Expression Question by TASdvlper (Monk) on Dec 04, 2003 at 20:06 UTC
`print "$1\n" if /^(\w+[\w,]+\w+)$/ && !/,,/;` [download] So, if I wanted to check for more than 2 consecutive commas (I should have mentioned that in the original post) I would do the following: `print "$1\n" if /^(\w+[\w,]+\w+)$/ && !/,{2,}/;` [download]	[reply] [d/l] [select]
Re: Re: Re: Regular Expression Question by kesterkester (Hermit) on Dec 04, 2003 at 21:35 UTC
Either would work-- `!/,,/` exludes strings containing 2 consecutive commas, and `!/,{2,}/` excludes strings containing 2-or-more consecutive commas, which amounts to pretty much the same thing for what you want, if I've understood you correctly.	[reply] [d/l] [select]
Re: Re: Re: Re: Regular Expression Question by kesterkester (Hermit) on Dec 04, 2003 at 21:41 UTC
Re: Re: Regular Expression Question by Not_a_Number (Prior) on Dec 04, 2003 at 21:40 UTC
Problem is, this doesn't work for very short strings (2 or fewer characters). Add these: `__DATA__ a bc` [download] dave	[reply] [d/l]
Re: Regular Expression Question by simonm (Vicar) on Dec 04, 2003 at 20:20 UTC
I'm not sure why people are using more than a single regular expression for this. The OP is asking for a regex that begins with word characters, ends with word characters, and internally may have one or more sequences of word-comma-word. This expression should do it: `/(\w+(?:\,\w+))/` [download] Update: the OP adds above "if I wanted to check for more than 2 consecutive commas", which requires a minor change: `/(\w+(?:\,{1,2}\w+))/` [download]	[reply] [d/l] [select]
Re: Re: Regular Expression Question by nevyn (Monk) on Dec 04, 2003 at 20:48 UTC
you should test your regexp on ",foo," which will work, and shouldn't, as will "!!foo!!,". So for a working single regexp you want (assuming \w is good enough)... `/^\w+ # Starts with a "word" (?:,\w+)*$/x; # Followed by many "comma and word" atoms` [download] Which IMO is ugglier than... `/^[\w,]+$]/ && # Comma word atoms ! /^,\|,,\|,$/; # With constraints on commas` [download] It's probably faster to use 2 regexps too -- James Antill	[reply] [d/l] [select]
Re: Re: Re: Regular Expression Question by simonm (Vicar) on Dec 04, 2003 at 20:59 UTC
you should test your regexp on ",foo," which will work, and shouldn't, as will "!!foo!!,". It depends on what you think "should work". The OP's original regex was not anchored, and seemed intended to extract matching substrings rather than confirm that an entire string matched. The `/(\w+(?:\,\w+))/` regex will successfully extract the matching "foo" substring from your two sample cases into `$1`. If you want to check the entire string, then yes, leave out the parenthesis and use `^...$` anchors. Update: with regard to It's probably faster to use 2 regexps too*: Yes, a quick Benchmarking shows that, with anchoring, the double-regex style runs about 50% faster than the single-regex solution I posted. (Perhaps one of the resident RegEx gurus can explain why this is?) However, if you want to extract matching substrings, I think the single regex is a sensible approach.	[reply] [d/l] [select]
Re: Re: Re: Re: Regular Expression Question by nevyn (Monk) on Dec 04, 2003 at 21:45 UTC
Re: Regular Expression Question (show me the can^H^H^Hbenchmark) by Abigail-II (Bishop) on Dec 05, 2003 at 15:09 UTC
Re: Re: Regular Expression Question (show me the can^H^H^Hbenchmark) by simonm (Vicar) on Dec 05, 2003 at 21:56 UTC
Re: Re: Regular Expression Question by hardburn (Abbot) on Dec 04, 2003 at 21:40 UTC
`\w` will also match the underscore, which the orginal poster does not specifically require, and so should not be included. ---- I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident. -- Schemer `: () { :\|:& };:` Note: All code is untested, unless otherwise stated	[reply] [d/l] [select]
Re: Re: Regular Expression Question by TASdvlper (Monk) on Dec 05, 2003 at 14:56 UTC
Maybe I'm missing something here ... is the "?:" actually necessary ? I read it in the Camel book and from their example I can't see the relevance of it for this regexp. Can someone explain this to me ?	[reply]
Re: Regular Expression Question by Abigail-II (Bishop) on Dec 04, 2003 at 23:03 UTC
`use Regexp::Common; /^$RE{list}{-sep => ','}{-pat => '\w+'}$/` [download] Abigail	[reply] [d/l]
Re: Re: Regular Expression Question by TASdvlper (Monk) on Dec 05, 2003 at 14:47 UTC
Didn't relalized there was Regexp module. For someone just learning how to use regexg, would you recommend understanding the basics first, prior to using the module.	[reply]
Re: Regular Expression Question by Abigail-II (Bishop) on Dec 05, 2003 at 14:57 UTC
I don't have an answer to that. It's like asking I have a car, and I want to hire a driver. Should I first learn to drive myself?. It all depends. If you want to drive yourself as well, then it's worthwhile to learn to drive. If you're not going to drive, no need to learn it. But if you are going to drive, I don't know whether you should delay hiring a driver before you've mastered to drive yourself. I don't even know whether it matters what you do first. Abigail	[reply]
Re: Regular Expression Question by ysth (Canon) on Dec 05, 2003 at 00:03 UTC
Just for fun, a different way to approach the problem: `/^(?:[A-Za-z0-9],[A-Za-z0-9]\|[A-Za-z0-9])+\z/`	[reply] [d/l]