DarknessX has asked for the wisdom of the Perl Monks concerning the following question:

Fellow Monks, I appeal to your wisdom, that by it I may grow. What I'm trying to do is, take a line of text, (that looks a bit like this: 0x160001a "fubar - BlahBlahBlah - Blah Blah (1:23)": ("Foo" "Bar")  Baz All that's important to me, is the "Blahblahblah - Blah Blah (1:23). Here comes my simple question: since the first - occurs before the values that I want, and because the first ) occurs afterwards, I've been doing it like this:
my $line="'0x160001a \"fubar - BlahBlahBlah - Blah Blah (1:23)\": ("Fo +o" "Bar") Baz"; if ($line=/\)/){ $line="$`)"; } if ($line=/-\s/){ $line=$'; }
Can I use $` and $' outside of the if? How? Is this an idiotic question? I await your (hopefully) helpful replies.

Replies are listed 'Best First'.
Re: Simple re question
by FamousLongAgo (Friar) on Jan 27, 2003 at 04:17 UTC
    I don't think your match is doing what you think. You've written in your if statement:
    $line=/-\s/;
    which is equivalent to
    $line = ($_ =~ /-\s/);
    Here's an easier way to get what you want, assuming that you want to match from the first hyphen to the first closing brace after that hyphen:
    my ($excerpt) = $line =~ /-([^)]+\))/;

    The variable $excerpt will now hold the value you want, without having to monkey around with the magic $` and $' variables (which impose a performance penalty on your entire program). You also go down from three matches to one.

Re: Simple re question
by Enlil (Parson) on Jan 27, 2003 at 04:31 UTC
    I don't know for sure where you are going with this, but the code as you have it won't work. as you are missing the tilde after the $line= in both your conditionals. Anyhow I think this is what you meant:
    my $line='0x160001a \"fubar - BlahBlahBlah - Blah Blah (1:23)\": ("Foo +" "Bar") Baz'; if ($line=~/\)/){ $line="$`)"; } if ($line=~/-\s/){ $line=$'; }
    But as anything else you can always set the values of $' and $` to a variable and use those variables later on:
    use strict; use warnings; my ($prematch,$postmatch); my $line='0x160001a \"fubar - BlahBlahBlah - Blah Blah (1:23)\": ("Foo +" "Bar") Baz'; if ($line=~/\)/){$line="$`)"; $prematch = $`;} if ($line=~/-\s/){$line=$'; $postmatch = $';} print $prematch,$/; print $postmatch,$/; print $line,$/;
    And you can use both $' and $` after you get a successful match in the current dynamic scope (as long as you are still in the current dynamic scope). Though I would probably solve the problem differently than you are doing by the same reasoning as you above.

    Here comes my simple question: since the first - occurs before the values that I want, and because the first ) occurs afterwards,

    use strict; use warnings; my $line='0x160001a \"fubar - BlahBlahBlah - Blah Blah (1:23)\": ("Foo +" "Bar") Baz'; if ( $line =~ /- ([^\)]+\))/ ) { print $1; $line = $1;}
    which captures in $1 what you described, and doesn't suffer from the performance penalty of using $' or $`

    -enlil

Re: Simple re question
by John M. Dlugosz (Monsignor) on Jan 27, 2003 at 07:07 UTC
    Regardless of the details of your code, which others have already critiqued, let me address your actual question:

    The variables $' and $` are not scoped to the if containing the regex as its condition. So, the values remain intact and unchanged after you drop out of the block controlled by the if.

    However, if the match failed, and the block controlled by the if was not run, the code following the if is run and what is the value of $' now? It's gibberish! So it makes no sence to refer to it outside of an if statement that tests on a successful match, since if the value is defined the branch will be taken and you might as well put the use of it inside the braces. If you use it after the if block, you don't know whether it's defined or not—that's the point of the if statement, to test whether the match succeeded.

    —John

Re: Simple re question
by DarknessX (Scribe) on Jan 27, 2003 at 20:55 UTC
    I've rewritten the regex with your help, thank you much. As for the missing tilde, that was just tired idiocy. My next question again deals with special characters in general.

    Why is there such a performance penalty for using them? Is it appreciable at all in a script which is only 5 or so lines (If you didn't figure it out, the above was just to get the songname of whatever's in XMMS, using xwindow -root -all | grep XMMS_Window.)? Without doing it stupidly (like mine above) or with ubercomplex regex (semi- is fine, I'm still learning) is there a way to get "from the beginning until char"? Muchos Gracias.

      It's not such a big deal. You're forcing the regex engine to make an extra copy of your data which in this case is a few measly bytes. It can make a difference for other scripts but you shouldn't even worry about it here. You'd need an advanced degree in Regex-ology to know exactly why though. I'd have to refer back to my copy of Mastering Regular Expressions to get the exact semantics.

      The general idea though is that those special variables must still be valid even if you alter or mangle the original string. The only way that works is if perl makes a new copy just for them. So that's the overhead. In this case it just doesn't matter.

      If your data is normal enough then you can use something like this $title =~ m{(<=- )([^"]+)}; on it. The return value (true/false) indicates whether it matched or not and the title is stored in $1.


      Seeking Green geeks in Minnesota

      Without doing it stupidly (like mine above) or with ubercomplex regex (semi- is fine, I'm still learning) is there a way to get "from the beginning until char"?

      If you want to capture up to a certain character you can just capture it with parens and [^-]. The ^ right after the [ makes the match anything besides whatever is in this character class (ie. anything within []). There is no need for lookahead or lookbehind negative width assertions. That is:

      /([^-]+)/
      will capture anything besides a "-" as many times as it can,and store that in $1, if you want it from the beginning you can add a ^ to the start of the regex (note the different meaning of ^ outside the character class square brackets), like so
      /^([^-]+)/
      This will anchor your match to start from the beginning. So any line starting with "-" will fail.

      -enlil