Re: Regular Expression Question
by !1 (Hermit) on Dec 04, 2003 at 19:33 UTC
|
It would probably be far easier to first verify that the string contains legitimate characters and then check whether or not any of the conditions that would make it illegal exist.
#!/usr/bin/perl -wl
use strict;
for (<DATA>) {
chomp;
if (! /[^a-zA-Z0-9,]/ and ! /^,|,{2}|,$/) {
print "Yep: $_";
} else {
print "Nope: $_";
}
}
__DATA__
this,should,work,11
,this,should,not,work
this,,too,should,not,work
nor,this,one,
Please please please read perldoc perlretut and perldoc perlre for a deeper understanding of regular expressions. | [reply] [d/l] |
|
|
Thanks for your help. Why is there a "!" in front of the 1st regexp ? I understand the 2nd.
| [reply] |
|
|
The character class is negated, note where the ^ is...
! /[^a-zA-Z0-9,]/
...however I'm not sure the double negative is faster enough to warrant not just doing the more obvious...
/^[a-zA-Z0-9,]+$/
...it's also not stated whether /^$/ is valid or not, which is different for the above two.
| [reply] [d/l] [select] |
Re: Regular Expression Question
by BrowserUk (Patriarch) on Dec 04, 2003 at 19:35 UTC
|
It's probably simpler and clearer (as well as more efficient) to use two regexes for this type of thing.
for ('fred,bill', 'fred,,bill', ',fred', 'bill,') {
print "Bad:'$_'" unless
$_ =~ m[^[a-zA-Z0-9,]+$] and
$_ !~ m[,{2}|,$|^,];
}
Bad:'fred,,bill'
Bad:',fred'
Bad:'bill,'
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
Hooray!
Wanted!
| [reply] [d/l] |
Re: Regular Expression Question
by kesterkester (Hermit) on Dec 04, 2003 at 19:36 UTC
|
If multiple regexes are acceptable, this'll get you a little closer to what you want, I think. The first regex in the if statement matches the alphanumerics and commas, but not leading/trailing commas; the second regex excludes consecutive commas.
use warnings;
use strict;
while ( <DATA> ) {
print "$1\n" if /^(\w+[\w,]+\w+)$/ && !/,,/;
}
__DATA__
!@#$as3dfa
,sdfas3df,
asd3fsa,,a3sdf
as3df,asdf3,3asdf,asd3f
sad3fasdjasdfkasdfklas3jf
3sad3fasdjasdfkasdfklas3jf
3sad3fasdjasdfkasdfklas3jf3
Output is:
as3dfa
sdfas3df
as3df,asdf3,3asdf,asd3f
sad3fasdjasdfkasdfklas3jf
3sad3fasdjasdfkasdfklas3jf
3sad3fasdjasdfkasdfklas3jf3
| [reply] [d/l] [select] |
|
|
print "$1\n" if /^(\w+[\w,]+\w+)$/ && !/,,/;
So, if I wanted to check for more than 2 consecutive commas (I should have mentioned that in the original post) I would do the following:
print "$1\n" if /^(\w+[\w,]+\w+)$/ && !/,{2,}/;
| [reply] [d/l] [select] |
|
|
Either would work-- !/,,/ exludes strings containing 2 consecutive commas, and !/,{2,}/ excludes strings containing 2-or-more consecutive commas, which amounts to pretty much the same thing for what you want, if I've understood you correctly.
| [reply] [d/l] [select] |
|
|
|
|
__DATA__
a
bc
dave
| [reply] [d/l] |
Re: Regular Expression Question
by simonm (Vicar) on Dec 04, 2003 at 20:20 UTC
|
/(\w+(?:\,\w+)*)/
Update: the OP adds above "if I wanted to check for more than 2 consecutive commas", which requires a minor change:
/(\w+(?:\,{1,2}\w+)*)/
| [reply] [d/l] [select] |
|
|
you should test your regexp on ",foo," which will work, and shouldn't, as will "!!foo!!,".
So for a working single regexp you want (assuming \w is good enough)...
/^\w+ # Starts with a "word"
(?:,\w+)*$/x; # Followed by many "comma and word" atoms
Which IMO is ugglier than...
/^[\w,]+$]/ && # Comma word atoms
! /^,|,,|,$/; # With constraints on commas
It's probably faster to use 2 regexps too
| [reply] [d/l] [select] |
|
|
you should test your regexp on ",foo," which will work, and shouldn't, as will "!!foo!!,".
It depends on what you think "should work". The OP's original regex was not anchored, and seemed intended to extract matching substrings rather than confirm that an entire string matched.
The /(\w+(?:\,\w+)*)/ regex will successfully extract the matching "foo" substring from your two sample cases into $1. If you want to check the entire string, then yes, leave out the parenthesis and use ^...$ anchors.
Update: with regard to It's probably faster to use 2 regexps too: Yes, a quick Benchmarking shows that, with anchoring, the double-regex style runs about 50% faster than the single-regex solution I posted. (Perhaps one of the resident RegEx gurus can explain why this is?)
However, if you want to extract matching substrings, I think the single regex is a sensible approach.
| [reply] [d/l] [select] |
|
|
|
|
|
|
|
|
| [reply] [d/l] [select] |
|
|
| [reply] |
Re: Regular Expression Question
by Abigail-II (Bishop) on Dec 04, 2003 at 23:03 UTC
|
use Regexp::Common;
/^$RE{list}{-sep => ','}{-pat => '\w+'}$/
Abigail | [reply] [d/l] |
|
|
Didn't relalized there was Regexp module. For someone just learning how to use regexg, would you recommend understanding the basics first, prior to using the module.
| [reply] |
|
|
I don't have an answer to that. It's like asking I have
a car, and I want to hire a driver. Should I first learn to
drive myself?. It all depends. If you want to drive
yourself as well, then it's worthwhile to learn to drive.
If you're not going to drive, no need to learn it. But if
you are going to drive, I don't know whether you should delay
hiring a driver before you've mastered to drive yourself.
I don't even know whether it matters what you do first.
Abigail
| [reply] |
Re: Regular Expression Question
by ysth (Canon) on Dec 05, 2003 at 00:03 UTC
|
Just for fun, a different way to approach the problem:
/^(?:[A-Za-z0-9],[A-Za-z0-9]|[A-Za-z0-9])+\z/ | [reply] [d/l] |