in reply to What happens with empty $1 in regular expressions? (was: Regular Expression Question)

What I did was a bit different. If you used the other examples and got a not match of the first match then $1 would still hold that value. If your goal is have $1 be 'undef' or to have an 'undef' value to play with, you may want to try this:
use strict; my $i = "mmmm9"; my $a = match_rtn( $i ); print $a, "\n"; $i = "mmm"; $a = match_rtn( $i ); print $a, "\n"; sub match_rtn { my $str = shift; $str =~ m/(\d+)/; return $1; }
So there is a method now doing the checking and returning the value of $1. What is creating the 'undef' value though, is the use of 'strict'. From what I understand, using strict causes variables to be isolated to the block of code they are declared in, and when that block is finished the variable is destroyed. So $1 would be destroyed at the end of the routine after the value is returned. It works though, this way you definitely have an 'undef' value to play with if that is what you are after.

I looked up exactly what 'strict' is supposed to do, and the Camel book says its supposed to disallow "unsafe" code. My question to anyone else is what is considered "unsafe"?

Amel - f.k.a. - kel

Replies are listed 'Best First'.
Re: Re: Regular Expression Question
by danger (Priest) on Feb 28, 2001 at 22:25 UTC

    Your method of enclosing the match operation only *appears* to work (in terms of leaving $1 unmodified after the sub call, and returning undef on failure), but not at all for the reasons you provide. The use of 'strict' has nothing to do with producing the undef value, and 'strict' has nothing to do with how variables are scoped. Had any successful match been applied in the outer scope prior to your sub calls, then $1 (which is global) would have been set there and its value would be the return value on any of your sub calls that failed to match. Check this minor variation on your example:

    use strict; "blah" =~ /(a)/; # now we've set $1 at the global scope my $i = "mmmm9"; my $a = match_rtn( $i ); print $a, "\n"; $i = "mmm"; $a = match_rtn( $i ); print $a, "\n"; # ook! this isn't undefined! sub match_rtn { my $str = shift; $str =~ m/(\d+)/; return $1; }

    Match variables ($1, $2, etc.) are global variables. When match variables are set (due to a successful match operation), they are always localized to the enclosing block. So, they retain their value until another successful pattern match, or the end of the current block. Witness:

    { $_ = 'blah'; /(a)/; print "$1\n"; # prints: a } print "$1\n"; # unitialized warning

    Now try this longer example and you'll see that the match variables are implicitly localized (ie, in the sense of local()):

    $_ = 'blah'; /(\w)/; print "$1\n"; # prints: b /(\d)/; print "$1\n"; # still prints: b { /(a)/; print "$1\n"; # prints: a } print "$1\n"; # prints: b

    The proper way to protect yourself from using unintended old values in $1 and friends is to program defensively and check if a pattern match succeeded before trying to use captured subexpressions (as previous messages in this thread have shown).

    I looked up exactly what 'strict' is supposed to do, and the Camel book says its supposed to disallow "unsafe" code. My question to anyone else is what is considered "unsafe"?

    Please see the documentation for strict and tye's review of strict.pm for starters.

      The proper way to protect yourself from using unintended old values in $1 and friends is to program defensively and check if a pattern match succeeded before trying to use captured subexpressions (as previous messages in this thread have shown).
      Well, that's a way. Another way is to stylistically outlaw all uses of $1 et seq, except in the right side of a substitution. Any other "capturing" should be done as list-context assignment:
      my ($first, $second) = $source =~ /blah(this)blah(that)blah/;
      Then it's very clear what the scope and origination of $first and $second are.

      -- Randal L. Schwartz, Perl hacker

        Indeed, that's true enough -- I was trying to make a more general statement but failed to sufficiently generalize away from the specific problem of resuing old $1 etc. The more general point being to check the success of an operation before doing things that depend on the operation having succeeded. Thus, even when doing a list assignment of subexpressions from a match, one will generally want to test that the match succeeded prior to using the results:

        if(my ($first, $second) = $source =~ /blah(this)blah(that)blah/){ print "$first $second\n"; }
Re: Re: Regular Expression Question
by chipmunk (Parson) on Mar 01, 2001 at 00:40 UTC
    I looked up exactly what 'strict' is supposed to do, and the Camel book says its supposed to disallow "unsafe" code. My question to anyone else is what is considered "unsafe"?

    I don't know whether you're looking at Camel II or III, but they both provide full documentation on strict. The Second edition documents the strict pragma beginning on page 500, and the Third edition beginning on page 858.

    Of course, the Camel book is not the only source of documentation. Each core module and pragma also comes with built-in documentation. You can read the standard documentation for strict with perldoc strict (or the equivalent on Windows [the HTML-ized docs] or Mac [Shuck]).

    The docs from 5.005_02 are even available on this site, including the docs for strict.

    To answer your question, three things are considered 'unsafe'. Each one is controlled by a separate part of strict. strict 'refs' prevents the use of symbolic references. strict 'vars' prevents the use of variables which are not pre-declared or fully qualified. strict 'subs' prevents the use of barewords.

      That much I got(didn't mean to sound like a d**khead). What I don't get is why that is unsafe. Is it because bareword could be confused with a built in function or something like that? Why are undeclared variables unsafe? Same kind of thing?

      Excuse my ignorance. I am starting to really get interested in the theories of programming and things like that and I would really like to get a handle on this stuff. I taught myself Perl about a year ago and I've had no one to ask these questions to. They are all sort of flooding out now.

      Thanks for your help.

      Amel - f.k.a. - kel

        I see, I misunderstood your question. My apologies.

        One of the main reasons these practices are unsafe is that they often occur by accident.

        Here are some quick examples of mistakes that will cause errors with strict:

        my $variable = 7; print $varaible; # typo # error from strict 'vars' my $hash = { name => 'value' }; my $key = 'name'; print $key->{'name'} # derefenced wrong variable # error from strict 'refs' sub subroutine { } print subrotine; # typo # errors from strict 'subs'
        Using strict also encourages good coding practices such as controlling the scope of your variables (strict 'vars') and using hard references instead of soft references (strict 'refs').

        In strict.pm I go over why barewords can be a problem. I don't agree with the term "unsafe" here. strict.pm helps Perl find simple mistakes for you.

        Undeclared variables are only a problem in that they prevent Perl from (reliably) telling you when you put a typo in a variable name.

                - tye (but my friends call me "Tye")