QuasarD has asked for the wisdom of the Perl Monks concerning the following question:

Hi, i have some issues with a regular expression:
use strict; use warnings; my $string1 = '"18/02/2018"'; print $string1 . "\n" if ($string1 =~ m/^"{1}[^"{1}]/); my $string2 = '"28/02/2018"'; print $string2 . "\n" if ($string2 =~ m/^"{1}[^"{1}]/); exit;
The scalar string1 doesn't match...but why??

Replies are listed 'Best First'.
Re: Regexp issue
by Eily (Monsignor) on Feb 21, 2018 at 11:07 UTC

    If a symbol doesn't have a special meaning (ie, it is a meta-character), including it in the regex without modifier means it must be present exactly once. {1} changes the previous token so that it is required exactly once, so it doesn't actually change anything after ".

    Inside [], most of the special meanings are thrown away, and each character is included separately as possible alternative. So ["{1}] means ", or {, or 1, or }. ^ inside [] negates the meaning, so [^"{1}] is one character that can't be 1 (or ", {, }). In your input string, the character after " is 1, which is not allowed, so the match fails.

Re: Regexp issue
by hippo (Archbishop) on Feb 21, 2018 at 14:47 UTC

    Eily is right, you should not have the braced term within the character class.

    m/^"{1}[^"{1}]/ # ^^^ ^^^-This is wrong # | # This is unnecessary

    If you are testing the same thing (here that a string matches a certain regex) more than once, don't write the code out more than once as (a) you might not type it the same in error and (b) if you make changes you need to do them twice. This is called DRY in the jargon.

    Taking both of these into account, and assuming you just want to match a leading double-quote followed by something other than a double-quote, here's an altered version of your code showing both date strings matching:

    use strict; use warnings; use feature 'say'; my @strings = ('"18/02/2018"', '"28/02/2018"'); for (@strings) { say if /^"[^"]/; }

    You can of course expand on this by putting other values into @strings and see if they match or not. If it gets any fancier, try turning it into a test instead with Test::More.

      assuming you just want to match a leading double-quote followed by something other than a double-quote

      I forgot to mention that i have to parse CSV rows, so i have to match a leading double-quote followed by EXACTLY something other than a double-quote, because a third double-quote will be escaped by the previous one:

      "18/02/2018" <- in this case the record is 18/02/2018

      ""18/02/2018" <- in this case the record is invalid

      """18/02/2018" <- in this case the record is "18/02/2018

        Previous advice from others about using a module notwithstanding, this additional requirement puts it well into "If it gets any fancier" territory. Here, therefore, is the test:

        use strict; use warnings; use Test::More; my @good = ( '"18/02/2018"', '"""18/02/2018"' ); my @bad = ( '""18/02/2018"' ); my $re = qr/^"("")?[^"]/; plan tests => @good + @bad; for my $str (@good) { like ($str, $re, "$str matched"); } for my $str (@bad) { unlike ($str, $re, "$str not matched"); }
Re: Regexp issue
by Anonymous Monk on Feb 21, 2018 at 12:04 UTC

    Why do you have the date in both ' and "" marks?

      Because this issue occurs while parsing a CSV file, so i have to replicate the " delimiter in this example.
        ... while parsing a CSV file ...

        Can you not use, say, Text::CSV or Text::CSV_XS, which deal very well with all manner of delimiters, double-quotes included?


        Give a man a fish:  <%-{-{-{-<