hiyall has asked for the wisdom of the Perl Monks concerning the following question:

capture groups are not working -- why ??

#!/usr/bin/perl -w use strict; my $line1 = "Current time => 20130530.000101 interval => 600 count => + 144\n"; my $line2 = "(60 rows affected)\n"; chomp($line1); chomp($line2); my ($YY,$MM,$DD,$HH,$MI,$SS) ; $_ = $line1; /^Current time =>\s(\d\d\d\d)(\d\d)(\d\d)\S(\d\d)(\d\d)(\d\d)/sx ; $YY = $1; $MM = $2; $DD = $3; $HH = $4; $MI = $5; $SS = $6; $_ = $line2; /^\((\d+).row/sx; my $numUsers = $1; print $YY,$MM,$DD,$HH,$MI,$SS,$numUsers,"\n"; print "Date: " . $MM . "\/" . $DD . "\/" . $YY . " Time: " . $HH . ":" + . $MI . ":" . $SS . " Number of Users: " . $numUsers . "\n";

Replies are listed 'Best First'.
Re: capture groups are not working -- why??
by AnomalousMonk (Archbishop) on Jun 05, 2013 at 18:40 UTC
    /^Current time =>\s(\d\d\d\d)(\d\d)(\d\d)\S(\d\d)(\d\d)(\d\d)/sx ;

    The capture groups "do not work" because your regex does not match. Your regex does not match because the whitespace in the regex is ignored due to the  //x regex modifier. If you wish to use the  //x modifier (a good idea in general, IMHO), use  [ ] or  \s to represent a space or whitespace.

    >perl -wMstrict -le "my $s = 'Current time => 123'; print 'match 1' if $s =~ /Current time => 123/x; ;; print 'match 2' if $s =~ /Current [ ] time \s => \s+ 123/x; " match 2

      The capture groups "do not work" because your regex does not match. Your regex does not match because the whitespace in the regex is ignored due to the //x regex modifier. If you wish to use the //x modifier (a good idea in general, IMHO), use or \s to represent a space or whitespace.

      Well, a good idea in general the /x modifier? Yes and no, in my humble optinion. Certainly not when it breaks a regex that would otherwise work. And the OP's regex would have worked without this modifier.

      I definitely agree that it is very good to make a very complicated regex multiline with comments, etc. with the /x modifier

      But for relatively simple regex, it sometimes defeats its own purpose and actually make thinks more complicated than they should be. Even though Perl sort of set the syntax for modern regexes, let's us not forget that many other traditional tools are using regexes, including grep, awk, vi, sed, etc. For them, a space usualkkly stands for a space. Changing this entails some risks. I am not sure that you want to do that systematically. Because it tends to make Perl less legible to a large part of the Unix community that has been using regex for several decades. I would say: do it when you need, don't do it when you don't need.

      I happily use the /x modifier when I am using a complicated multiline commented regex, and I love this possibility, but I am not convinced that it is useful for a simlple one-line regex. The case discussed here in this post shows exactly that it can be counter-productive.

      Just to make sure how blasphematory my post may be, I just grabbed my copy of Damian Conway's Perl Best Practices, and, yes, he says to always use the /x modifier, I am probably an heretic. Well, I have a lot of admiration and respect for Damian, and I certainly don't want to challenge his authority, but that is one of the few cases (perhaps a dozen or two) where I have to disagree with him. In most cases where I am using simple regexes, I don't want to make them more complex than they should be by adding the /x modifier. Of course I fully agree when the regex becomes hairy.

      I should add that, in my programming experience in Perl, I am using very simple regexes very very often (for example to discard quickly useless lines in a file), and more complicated regexes much less often.

Re: capture groups are not working -- why??
by frozenwithjoy (Priest) on Jun 05, 2013 at 19:57 UTC

    AnomalousMonk pointed out your problem, but I wanted to add a couple suggested changes to the regex for readability:

    m|^Current time => (\d{4})(\d{2})(\d{2})\.(\d{2})(\d{2})(\d{2})|s

    I think this helps the reader easily see the pattern of #s being matched (4-2-2-2-2-2). Also, I changed the \S to \. under the assumption that it will always be a period rather than any non-white space character.

      Thank you all. I had looked at this and attempted to debug with RegexBuddy - and never saw the effect of the /x. I learned a valuable bit of knowledge today.

Re: capture groups are not working -- why??
by hdb (Monsignor) on Jun 05, 2013 at 19:05 UTC

    Now that, after following AnomalousMonk's advice, your code works, may I suggest to use a slightly shorter way of achieving the same thing:

    my ($YY,$MM,$DD,$HH,$MI,$SS) = $line1 =~ /^Current\stime\s=>\s(\d\d\d\d)(\d\d)(\d\d)\S(\d\d)(\d\d)( +\d\d)/sx ; my ($numUsers) = $line2 =~ /^\((\d+).row/sx;

    Clearly, this is personal preference, so please ignore if you find it too condensed.

Re: capture groups are not working -- why??
by a (Friar) on Jun 05, 2013 at 19:42 UTC
    Just a good idea:
    my ($YY,$MM,$DD,$HH,$MI,$SS) ; $_ = $line1; /^Current time =>\s(\d\d\d\d)(\d\d)(\d\d)\S(\d\d)(\d\d)(\d\d)/sx ;
    Test your matches (esp. if you're unsure about the data) so you can respond if it fails
    if ($line1 =~ /^Current time =>\s(\d\d\d\d)(\d\d)(\d\d)\S(\d\d)(\d\d)( +\d\d)/sx ) { # assign your captures ... } else { # warn about failing to match warn("No Match: $line1\n"); }

    a