princepawn has asked for the wisdom of the Perl Monks concerning the following question:

The result of calling a particular shell command on my local system is the following:
azxinetgw.EXECFILE_DIR.azxinetgw='/usr/local/instinet/RTS/etc/local/az +xfiles'
I wrote the following Perl code to get the value part of the string (the part after the equal sign) using regexes even though split is easier so that I could practice my regexes. Here is my code, yet it does not work (in other words, printing $1 yields an empty string). So it is matching, but $1 is not bound. My regex is saying:
  1. match everything that is not an equal
  2. then match and equal
  3. then match a quote
  4. then match everything that is not a quote and save in $1
  5. then match another quote
But it doesn't seem to work:
sub get_assigned_dir { my $gpa_key = shift; my $gpa_ret = `$gpa $gpa_key`; warn $gpa_ret; if ( $gpa_ret =~ /[^=]+[=][']([^']+)[']/ ) { my $assigned_dir = $1 } else { die "couldn't determine assigned directory for $ +gpa_key" } log_msg "assigned dir for $gpa_key == $assigned_dir"; }

Also

Also would someone mind telling me why $1 simply has one 'a' in it in the second case?
$string = 'aaaaa'; print $1,$/ if $string =~ /(a)+/; print $1,$/ if $string =~ /(a+)/;
  • Comment on Regular expression to match an A=B type string needs help in storing matched parts of string
  • Select or Download Code

Replies are listed 'Best First'.
(Ovid) RE: Regular expression to match an A=B type string needs help in storing matched parts of string
by Ovid (Cardinal) on Sep 21, 2000 at 19:25 UTC
    Your regex works fine, but you have $assigned_dir declared with my inside of a block, so you have a scoping issue. I like the fact that you're using negated character classes instead of a .* construct; it's much more efficient. However, the single quotes can be left bare, rather than in a character class:
    /[^=]+[=]'([^']+)'/
    As for your other question, you have the regexes reversed: it only prints one 'a' in the first regex. This is because your capturing parens are only capturing one 'a' and the multiplier is outside of the one character that it's capturing. In your second regex, the parens are capturing everything because the multiplier is inside.

    Cheers,
    Ovid

    Join the Perlmonks Setiathome Group or just go the the link and check out our stats.

Re: Regular expression to match an A=B type string needs help in storing matched parts of string
by Fastolfe (Vicar) on Sep 21, 2000 at 19:17 UTC
    It's not that $1 isn't being set. You're declarying $assigned_dir with my inside your if block. This means when you exit your block, $assigned_dir falls out of scope and is destroyed. Your log_msg function sees an undefined value.

    Consider 'use strict'. It would have caught this.

Re: Regular expression to match an A=B type string needs help in storing matched parts of string
by japhy (Canon) on Sep 21, 2000 at 21:47 UTC
    The first question has been answered fine. I'd like to give a brief insight on the second question.

    In the regex "abc" =~ /(\w)+/, one would expect $1 would be 'a', since that is the first \w matched. This is not the case -- $1 is 'c'. This is because the regex engine does this:
    1: PAREN1 2: MATCH \w OR GOTO 5 3: CLOSE1 4: GOTO 1 5: DONE
    That's a pretty dumbed-down explanation, but as you can see, $1 gets defined and redefined over and over again.

    For more information, read perldebug.

    $_="goto+F.print+chop;\n=yhpaj";F1:eval
(tye)Re: Regular expression to match an A=B type string needs help in storing matched parts of string
by tye (Sage) on Sep 21, 2000 at 19:48 UTC

    Note that your regex will match a bit faster if you change "+" to "+?":

    if ( $gpa_ret =~ /[^=]+?[=][']([^']+)[']/ ) {

    The speed-up can be significant if you have really long strings that match.

    For your second question I'd like to note that I often find myself wanting @1 so that I could get the list of matches that a grouping matched!

            - tye (but my friends call me "Tye")
      Why is that, tye? There's no reason to make the + non-greedy. There's no way for [^=] to match an =. Here's a benchmark:
      #!/usr/bin/perl use Benchmark 'timethese'; $short = "abcdefg"; $long = $short x 100; timethese(-5, { japhyS => q{ "$short=123" =~ /[^=]+=/ }, tyeS => q{ "$short=123" =~ /[^=]+?=/ }, japhyL => q{ "$long=123" =~ /[^=]+=/ }, tyeL => q{ "$long=123" =~ /[^=]+?=/ }, }); __END__ Benchmark: running japhyL, japhyS, tyeL, tyeS, each for at least 5 CPU seconds... japhyL: 5803.60/s (n=29018) tyeL: 1785.83/s (n=8947) japhyS: 30179.50/s (n=157537) tyeS: 26449.44/s (n=141240)
      It gets worse for longer strings.

      $_="goto+F.print+chop;\n=yhpaj";F1:eval

        Well, I was thinking of the string after the "=" being long, but the greedy version still wins. Perhaps the regex works from the back even when non-greedy?

        My (incorrect, apparently) thinking was that /[^=]+=/ will have to check the whole string to make sure there isn't a second "=" later on while /[^=]+?=/ could just stop at the first "=" (provided the rest of the regex matched). I'd be interested in any insights on this. [ Didn't I make this exact same mental error before... I'll have to go check and then double check the read-only tab on my brain ]

                - tye (but my friends call me "Tye")
Re: Regular expression to match an A=B type string needs help in storing matched parts of string
by japhy (Canon) on Sep 21, 2000 at 22:11 UTC
    Oh, and I'll say this again. If you have ANY control over your input string, then you can match EXPLICITLY what you want. If you know the string is going to be in the form before-the-equals-sign='in-the-quotes', then you don't NEED to do
    $string =~ /[^=]+='([^']+)'/;
    If you are confident, just doing
    $string =~ /='([^']+)'/;
    is enough. It may not be 100 times faster, but it's less noise to look at.

    $_="goto+F.print+chop;\n=yhpaj";F1:eval
Re: Regular expression to match an A=B type string needs help in storing matched parts of string
by mrmick (Curate) on Sep 21, 2000 at 19:34 UTC
    I tried to test this and it seems as though your regular expression is working fine. I had to make some small changes to test it and removed the single quotes on the second line ('$gpa $gpa_key') and replaced it with ($gpa_ret = "gpa: $gpa_key"). Otherwise, the value would not interpolate for the regex.
    my $gpa_key = "azxinetgw.EXECFILE_DIR.azxinetgw='/usr/local/instinet/R +TS/etc/local/azxfiles'"; my $gpa_ret = "gpa: $gpa_key"; warn $gpa_ret; print "gpakey: $gpa_key\n"; #print "gparet: $gpa_ret\n"; if ( $gpa_ret =~ /[^=]+[=][']([^']+)[']/ ) { my $assigned_dir = $1 ; print "$assigned_dir\n";} else { die "couldn't determine assigned directory for $gpa_key" + }
    By making this change, we are now passing the string instead of the literal name of the variable. You were actually testing the regex conditions on `$gpa $gpa_key` instead of the string value. The output from my modified version is:
    gpa: azxinetgw.EXECFILE_DIR.azxinetgw='/usr/local/instinet/RTS/etc/loc +al/azxfiles' at D:\DATA_TO_MIGRATE\dbexamples\ex\gpa.pl line 7 . gpakey: azxinetgw.EXECFILE_DIR.azxinetgw='/usr/local/instinet/RTS/etc/ +local/azxfiles' /usr/local/instinet/RTS/etc/local/azxfiles

    Mick