jkeenan1 has asked for the wisdom of the Perl Monks concerning the following question:

I have a situation where I am unexpectedly getting the Unterminated \g... pattern in regex error message.

In one of my CPAN modules, I have source code in a constructor which tests for the existence of certain directories.

    my @missing_dirs = ();
    for my $dir ( qw| gitdir workdir outputdir | ) {
        push @missing_dirs, $data{$dir}
            unless (-d $data{$dir});
    }
    if (@missing_dirs) {
        croak "Cannot find directory(ies): @missing_dirs";
    }
In the test suite, I have code which composes a path to a non-existent directory and then demonstrates that I get the expected error message when I try to use that path as an argument to the constructor. I use File::Spec->catdir() to compose the path string in an operating system-agnostic way.
    local $@;
    $bad_gitdir = File::Spec->catdir('', qw| home jkeenan gitwork mist-compare |);
    $args{gitdir} = $bad_gitdir;
    $params = process_options(%args);
    eval { $self = Devel::Git::MultiBisect::AllCommits->new($params); };
    like($@, qr/Cannot find directory\(ies\): $bad_gitdir/,
        "Got expected error: missing directory $bad_gitdir"
    );
In all CPANtesters reports to date from Unix-like platorms, this code has worked. There's only one CPANtester submitting reports from Windows (in this case, Strawberry Perl running Perl 5.22.1) and that reporter is showing this test failure:
Unterminated \g... pattern in regex; marked by <-- HERE in m/Cannot find directory\(ies\): \home\jkeenan\g <-- HERE itwork\mist-compare/ at t/002-new.t line 38.
I'm very puzzled by this (and I don't have a Windows box on which to debug this). The test code is clearly using m// as regex pattern delimiters, so I don't understand why, in this environment, \g is somehow being treated as if it were intended to be the end of the pattern.

Thoughts? Work-arounds?

Thank you very much.

Jim Keenan

Replies are listed 'Best First'.
Re: Unterminated \g... pattern in regex behaving badly on Windows
by Eily (Monsignor) on Nov 16, 2016 at 13:26 UTC

    The rule with regexes is that non-alphanumeric characters (\W) lose their special meaning when escaped with a \, (\* is a litteral '*', \( is a litteral parenthesis etc...), while alphanumeric characters gain a special meaning when escaped. So \1, \2 etc... are reference to captured groups, \w means alphanumeric characters. This means that \x where x is a letter should always be considered to have a special meaning, even if this is not the case for the current version of perl, because such a meaning may be added in future versions. And yes, even interpolated variables are read using the regex syntax, and not taken litteraly.

    \g is the start of a regex special meaning, referenced in perlreref:

    \g1 or \g{1}, \g2 ... Matches the text from the Nth group
    Since you don't have a number after you \g, this is, indeed, an incomplete pattern.

    The solution is to remove its special meaning to \ (the escaping character), by adding another \ in front of it. The easiest way to do that is to use quotemeta which will leave all alphanumeric characters untouched, and escape all others (those that may have a special meaning). my $esc_bad_gitdir = quotemeta $bad_gitdit;
    Or using the \Q shortcut: qr/Cannot find directory\(ies\): \Q$bad_gitdir/

    Edit: there were typos everywhere...

      You could also quotemeta the whole string. Then there's no need to manually escape the parenthesis:

      qr/\QCannot find directory(ies): $bad_gitdir\E/

      or maybe even

      qr/ \QCannot find directory(ies): $bad_gitdir\E /xms

      (The \E terminates the escaping - in case you want to add something else later on.)

        I went with that recommendation:
        - like($@, qr/Cannot find directory\(ies\): $bad_gitdir/, + like($@, qr/\QCannot find directory(ies): $bad_gitdir\E/,
        Although in a couple of places I had to use the \Q ... \E syntax twice within one pattern:
        - like($@, qr/Cannot find file\(s\) to be tested:.*$bad_target_args +->[1]/, + like($@, qr/\QCannot find file(s) to be tested:\E.*\Q$bad_target_ +args->[1]\E/,
        Thanks to all who responded. The revised code is available at Devel-Git-MultiBisect-0.07.tar.gz and should be available on CPAN within the hour. If anyone on Windows could give that a smoke test, that would be great.

        Thank you very much.

        Jim Keenan
      Eily wrote:
      \g is the start of a regex special meaning, referenced in perlreref:
      Well, I now realize that a big part of my diagnostic problem was that I failed to recognize \g as referring to a capture group. This syntax has been in Perl for some time now but I never had occasion to use it and hence was blissfully unaware of it. I was also confusing it with the /g qualifier. Thank you very much.
      Jim Keenan
Re: Unterminated \g... pattern in regex behaving badly on Windows
by Corion (Patriarch) on Nov 16, 2016 at 14:23 UTC

    Most likely, you want to just use quotemeta, because your path separators in the form of backslashes are somewhere interpolated into a regular expression.

    I don't see the code where you interpolate stuff into a regex, but if you look at where you do that, you should find the cause.

Re: Unterminated \g... pattern in regex behaving badly on Windows
by kennethk (Abbot) on Nov 16, 2016 at 16:17 UTC
    Please read Markup in the Monastery, in particular note that you should use <code> tags, not <pre> tags, as the former allows for easy and reliable code download and the latter can mess up page formatting.

    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.