nehavin has asked for the wisdom of the Perl Monks concerning the following question:

Hi, Please look at my code below, here I am trying to capture any strings with any character except the comma. Its a simple code but I am not able to understand what is wrong with the regex :(

my $regex = qr/(?:[\S(^,)]+)/; my $var = "abc,d"; $flag=1 if $var =~ /\G ($regex) /gcx; print "matches \n" if $flag==1; print "not matches \n" if $flag==0;

Here the result I want is: "not matches" but its "matches" everytime.

Replies are listed 'Best First'.
Re: Regex to capture every readable character except COMMA
by davido (Cardinal) on Jan 09, 2014 at 06:35 UTC

    Character classes treat parens like literal characters, they have no other semantics. Also, if the ^ is intended to negate the comma, it doesn't. ^ as a character class negation construct only has its magic if it immediately follows the opening [.


    Dave

Re: Regex to capture every readable character except COMMA
by Athanasius (Archbishop) on Jan 09, 2014 at 06:44 UTC

    Following on from davido’s answer: if you want to match only when the string contains no commas, you have to specify that every character is a non-comma; therefore, you need to anchor the regex at the start and end of the string. For example:

    #! perl use strict; use warnings; my $regex = qr/ \A [^,]+ \Z /x; my $var = "abc,d"; my $flag = 1 if $var =~ /\G ($regex) /gcx; print $flag ? '' : 'not ', "matches\n";

    Output:

    16:38 >perl 827_SoPW.pl not matches 16:42 >

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      my $regex = qr/ \A [^,]+ \Z /x;
      ...
      my $flag  = 1 if $var =~ /\G ($regex) /gcx;

      Because  qr/ \A [^,]+ \Z /x matches an entire string, the  /g regex modifier has no meaning (and it's also used in a boolean context). Likewise the  /c modifier. Likewise the  \G anchor. Likewise the capture group around the  $regex regex object, although in the original code a capture may be pertinent.

      >perl -wMstrict -le "my $rx = qr/ \A [^,]* \z /x; ;; for my $s (',abcd', 'abc,d', 'abcd,', ',,,', 'abcd', '.;$%&', '') { print qq{'$s' }, $s =~ $rx ? '' : 'NO', ' match'; } " ',abcd' NO match 'abc,d' NO match 'abcd,' NO match ',,,' NO match 'abcd' match '.;$%&' match '' match

        Good points! So maybe a regex like this, using a positive lookahead, would be more useful here:

        use strict; use warnings; my $regex = qr/ ([^,]+) (?=\Z|,) /x; my $var = 'abcd,ef,,ghijkl,mnop'; print "$1\n" while $var =~ /$regex/g;

        Output:

        18:27 >perl 827_SoPW.pl abcd ef ghijkl mnop 18:27 >

        But then, the same result can be obtained by discarding the regex and using split instead:

        use strict; use warnings; my $var = 'abcd,ef,,ghijkl,mnop'; my @matches = split /,/, $var; $_ && print "$_\n" for @matches;

        ;-)

        Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: Regex to capture every readable character except COMMA
by kcott (Archbishop) on Jan 09, 2014 at 07:15 UTC

    G'day nehavin,

    Your first problem is your character class: those extra characters (e.g. parentheses) aren't doing what it looks like you're expecting them to do. See "perlretut: Using character classes" for details.

    Beyond this, you've overcomplicated what's needed. You can use the regex as the condition and $var =~ /,/ will be true if $var contains a comma. Using !~, instead of =~, negates the result. Perhaps a review of "perlretut - Perl regular expressions tutorial" would prove useful.

    You might also consider using transliteration (either y/// or tr/// — they're synonymous) which can provide a speed benefit if that's important. See "perlop: Quote-Like Operators" for details.

    Here's some examples:

    #!/usr/bin/env perl -l use strict; use warnings; my @vars = ('abc,d', 'abcd', 'a,b,c,d'); for my $var (@vars) { print "var = '$var'"; print 'm/,/: ', $var =~ /,/ ? 'not matches' : 'matches'; print 'y/,/,/: ', $var =~ y/,/,/ ? 'not matches' : 'matches'; print '! m/,/: ', $var !~ /,/ ? 'matches' : 'not matches'; print '! y/,/,/: ', $var !~ y/,/,/ ? 'matches' : 'not matches'; }

    Output:

    var = 'abc,d' m/,/: not matches y/,/,/: not matches ! m/,/: not matches ! y/,/,/: not matches var = 'abcd' m/,/: matches y/,/,/: matches ! m/,/: matches ! y/,/,/: matches var = 'a,b,c,d' m/,/: not matches y/,/,/: not matches ! m/,/: not matches ! y/,/,/: not matches

    -- Ken

Re: Regex to capture every readable character except COMMA
by Anonymous Monk on Jan 09, 2014 at 14:10 UTC

    XY problem check:

    Are you sure you don't want to just use Text::CSV and not have to worry about commas in the first place?

Re: Regex to capture every readable character except COMMA
by Digioso (Sexton) on Jan 09, 2014 at 17:02 UTC
    Basically you can make it a two-step method as well which might not be the most elegant/efficient solution but at least it's working and it's damn simple.
    #!/usr/bin/perl -w use warnings; use strict; my $var = "abc,d"; if($var =~ /,/) { print "not matches \n"; } elsif($var =~ /(.*)/) { print "matches \n Captured: $1\n"; } $var = "abcd"; if($var =~ /,/) { print "not matches \n"; } elsif($var =~ /(.*)/) { print "matches \n Captured: $1\n"; } exit 0;
    Output:
    not matches matches Captured: abcd
      if($var =~ /,/) { print "not matches \n"; } elsif($var =~ /(.*)/) { print "matches \n Captured: $1\n"; }

      The intent here seems to be to capture the entire string if it contains no comma. But  $var already holds the entire string! Why 'capture' it again?

      Furthermore, a confounding subtlety lurks at the heart of the seemingly simple, innocent  /(.*)/ regex: the  . (dot) metaoperator matches everything except a newline: it will only capture up to the first newline, if present. You have just built a bug into your code. (This subtlety is the rationale for the PBP injunction to always use the  /s "dot matches all" modifier with every regex.)

Re: Regex to capture every readable character except COMMA
by soonix (Chancellor) on Jan 09, 2014 at 08:08 UTC
    OK, this may not be relevant if you want/need extended functionality like abbreviations (\d \s \w), capture groups etc, but an "old school" regex like
    ^[^,]$
    has the additional benefit that you can try it out in your editor (vi, Notepad++, etc.)