Regex to capture every readable character except COMMA

nehavin has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.

Re: Regex to capture every readable character except COMMA
by davido (Cardinal) on Jan 09, 2014 at 06:35 UTC

Character classes treat parens like literal characters, they have no other semantics. Also, if the ^ is intended to negate the comma, it doesn't. ^ as a character class negation construct only has its magic if it immediately follows the opening [.

Dave

[reply]
[d/l]
[select]

Re: Regex to capture every readable character except COMMA
by Athanasius (Archbishop) on Jan 09, 2014 at 06:44 UTC

Following on from davido’s answer: if you want to match only when the string contains no commas, you have to specify that every character is a non-comma; therefore, you need to anchor the regex at the start and end of the string. For example:

#! perl
use strict;
use warnings;

my $regex = qr/ \A [^,]+ \Z /x;
my $var   = "abc,d";
my $flag  = 1 if $var =~ /\G ($regex) /gcx;

print $flag ? '' : 'not ', "matches\n";
[download]

Output:

16:38 >perl 827_SoPW.pl
not matches

16:42 >
[download]

Hope that helps,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

[reply]
[d/l]
[select]

Re^2: Regex to capture every readable character except COMMA

by AnomalousMonk (Archbishop) on Jan 09, 2014 at 07:19 UTC

my $regex = qr/ \A [^,]+ \Z /x;
...
my $flag = 1 if $var =~ /\G ($regex) /gcx;

Because qr/ \A [^,]+ \Z /x matches an entire string, the /g regex modifier has no meaning (and it's also used in a boolean context). Likewise the /c modifier. Likewise the \G anchor. Likewise the capture group around the $regex regex object, although in the original code a capture may be pertinent.

>perl -wMstrict -le
"my $rx = qr/ \A [^,]* \z /x;
 ;;
 for my $s (',abcd', 'abc,d', 'abcd,', ',,,', 'abcd', '.;$%&', '') {
   print qq{'$s' }, $s =~ $rx ? '' : 'NO', ' match';
   }
"
',abcd' NO match
'abc,d' NO match
'abcd,' NO match
',,,' NO match
'abcd'  match
'.;$%&'  match
''  match
[download]

[reply]
[d/l]
[select]

Re^3: Regex to capture every readable character except COMMA

by Athanasius (Archbishop) on Jan 09, 2014 at 08:36 UTC

Good points! So maybe a regex like this, using a positive lookahead, would be more useful here:

use strict;
use warnings;

my $regex = qr/ ([^,]+) (?=\Z|,) /x;
my $var   = 'abcd,ef,,ghijkl,mnop';

print "$1\n" while $var =~ /$regex/g;
[download]

Output:

18:27 >perl 827_SoPW.pl
abcd
ef
ghijkl
mnop

18:27 >
[download]

But then, the same result can be obtained by discarding the regex and using split instead:

use strict;
use warnings;

my $var     = 'abcd,ef,,ghijkl,mnop';
my @matches = split /,/, $var;

$_ && print "$_\n" for @matches;
[download]

;-)

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

[reply]
[d/l]
[select]

Re^4: Regex to capture every readable character except COMMA

by AnomalousMonk (Archbishop) on Jan 09, 2014 at 18:27 UTC

Re: Regex to capture every readable character except COMMA
by kcott (Archbishop) on Jan 09, 2014 at 07:15 UTC

G'day nehavin,

Your first problem is your character class: those extra characters (e.g. parentheses) aren't doing what it looks like you're expecting them to do. See "perlretut: Using character classes" for details.

Beyond this, you've overcomplicated what's needed. You can use the regex as the condition and $var =~ /,/ will be true if $var contains a comma. Using !~, instead of =~, negates the result. Perhaps a review of "perlretut - Perl regular expressions tutorial" would prove useful.

You might also consider using transliteration (either y/// or tr/// — they're synonymous) which can provide a speed benefit if that's important. See "perlop: Quote-Like Operators" for details.

Here's some examples:

#!/usr/bin/env perl -l

use strict;
use warnings;

my @vars = ('abc,d', 'abcd', 'a,b,c,d');

for my $var (@vars) {
    print "var = '$var'";
    print 'm/,/: ', $var =~ /,/ ? 'not matches' : 'matches';
    print 'y/,/,/: ', $var =~ y/,/,/ ? 'not matches' : 'matches';
    print '! m/,/: ', $var !~ /,/ ? 'matches' : 'not matches';
    print '! y/,/,/: ', $var !~ y/,/,/ ? 'matches' : 'not matches';
}
[download]

Output:

var = 'abc,d'
m/,/: not matches
y/,/,/: not matches
! m/,/: not matches
! y/,/,/: not matches
var = 'abcd'
m/,/: matches
y/,/,/: matches
! m/,/: matches
! y/,/,/: matches
var = 'a,b,c,d'
m/,/: not matches
y/,/,/: not matches
! m/,/: not matches
! y/,/,/: not matches
[download]

-- Ken

[reply]
[d/l]
[select]

Re: Regex to capture every readable character except COMMA
by Anonymous Monk on Jan 09, 2014 at 14:10 UTC

XY problem check:

Are you sure you don't want to just use Text::CSV and not have to worry about commas in the first place?

[reply]

Re: Regex to capture every readable character except COMMA
by Digioso (Sexton) on Jan 09, 2014 at 17:02 UTC

#!/usr/bin/perl -w

use warnings;
use strict;

my $var = "abc,d";

if($var =~ /,/)
{
    print "not matches \n";
}
elsif($var =~ /(.*)/)
{
    print "matches \n Captured: $1\n";
}

$var = "abcd";

if($var =~ /,/)
{
    print "not matches \n";
}
elsif($var =~ /(.*)/)
{
    print "matches \n Captured: $1\n";
}

exit 0;
[download]

not matches
matches
 Captured: abcd
[download]

[reply]
[d/l]
[select]

Re^2: Regex to capture every readable character except COMMA

by AnomalousMonk (Archbishop) on Jan 09, 2014 at 19:02 UTC

if($var =~ /,/) { print "not matches \n"; } elsif($var =~ /(.*)/) { print "matches \n Captured: $1\n"; }
[download]

The intent here seems to be to capture the entire string if it contains no comma. But $var already holds the entire string! Why 'capture' it again?

Furthermore, a confounding subtlety lurks at the heart of the seemingly simple, innocent /(.*)/ regex: the . (dot) metaoperator matches everything except a newline: it will only capture up to the first newline, if present. You have just built a bug into your code. (This subtlety is the rationale for the PBP injunction to always use the /s "dot matches all" modifier with every regex.)

[reply]
[d/l]
[select]

Re: Regex to capture every readable character except COMMA
by soonix (Chancellor) on Jan 09, 2014 at 08:08 UTC

~~abbreviations (\d \s \w),~~

^[^,]$

[reply]
[d/l]