jockel has asked for the wisdom of the Perl Monks concerning the following question:

Hi all

I have a few regexp that checks input from the user to rule out any mailicious code.

here's an example
$adress1 =~ /^([A-Za-z0-9öäåÖÄÅ][A-Za-zöäåÖÄÅ0-9\s\-\.\,]*)$/; $adress1 = $1; $adress2 =~ /^([A-Za-z0-9öäåÖÄÅ][A-Za-zöäåÖÄÅ0-9\s\-\.\,]*)$/; $adress2 = $1;
But if $adress2 doesn't contain anything OR containing anyhing that will make the regexp fail, $adress2 eq $adress1 after these rows.
Why doesn't $1 reset when the regexp doesn't match anything?
Any ideas?

Best regards
/Jocke

Replies are listed 'Best First'.
Re: $1 don't reset (non loop)
by snowcrash (Friar) on Feb 17, 2004 at 11:20 UTC
    from the perlre manpage: The numbered variables ($1, $2, $3, etc.) and the related punctuation set ("$+", "$&", "$`", and "$'") are all dynamically scoped until the end of the enclosing block or until the next successful match, whichever comes first.

    So $1 is only set if the match succeeds, that's why it's a good idea to write
    if ($adress1 =~ /^([A-Za-z0-9öäåÖÄÅ][A-Za-zöäåÖÄÅ0-9\s\-\.\,]*)$/) { $adress1 = $1; } if ($adress2 =~ /^([A-Za-z0-9öäåÖÄÅ][A-Za-zöäåÖÄÅ0-9\s\-\.\,]*)$/) { $adress2 = $1; }

    snowcrash
      If I see it right, you didn't get it quite correct. $addressX should become undef if no match. So this should work:
      if ($adress1 =~ /^([A-Za-z0-9öäåÖÄÅ][A-Za-zöäåÖÄÅ0-9\s\-\.\,]*)$/) { $adress1 = $1; } else { $adress1 = undef; } # and so on...
      which should be reducable to:
      Which is not, as you'll see below, reducable to:
      { $adress1 =~ /^([A-Za-z0-9öäåÖÄÅ][A-Za-zöäåÖÄÅ0-9\s\-\.\,]*)$/) $adress1 = $1; } { $adress2 =~ /^([A-Za-z0-9öäåÖÄÅ][A-Za-zöäåÖÄÅ0-9\s\-\.\,]*)$/) $adress2 = $1; }
      Update: Thanks to Abigail-II who found out my mistake.
        No, not really. That only works if $1 happens to be undefined. Which you can't count on. Watch:
        #!/usr/bin/perl use strict; use warnings; "meep" =~ /(.*)/; # Set $1. my $address1 = "Tsjakka"; # Will match. my $address2 = "Tsjakka!"; # Will not match. { $address1 =~ /^([A-Za-z0-9öäåÖÄÅ][A-Za-zöäåÖÄÅ0-9\s\-\.\,]*)$/; $address1 = $1; } { $address2 =~ /^([A-Za-z0-9öäåÖÄÅ][A-Za-zöäåÖÄÅ0-9\s\-\.\,]*)$/; $address2 = $1; } print $address1, "\n"; print $address2, "\n"; __END__ Tsjakka meep
        $address2 doesn't become undef. The solution is much simpler. The regex will either match the entire string, or there's no match. So, we could just do:
        /^([A-Za-z0-9öäåÖÄÅ][A-Za-zöäåÖÄÅ0-9\s\-\.\,]*)$/ or $_ = undef for $address1, $address2;

        Abigail

Re: $1 don't reset (non loop)
by Abigail-II (Bishop) on Feb 17, 2004 at 11:42 UTC
    Why doesn't $1 reset when the regexp doesn't match anything?
    Just because it was designed that way. $1 will only be set on a succesful match, and hence remain what it is on a failed match.

    Abigail

Re: $1 don't reset (non loop)
by Skeeve (Parson) on Feb 17, 2004 at 11:31 UTC
    How about:
    ($address1)= $adress1 =~ /^([A-Za-z0-9öäåÖÄÅ][A-Za-zöäåÖÄÅ0-9\s\-\.\,] +*)$/; ($address2)= $adress2 =~ /^([A-Za-z0-9öäåÖÄÅ][A-Za-zöäåÖÄÅ0-9\s\-\.\,] +*)$/;
      Hi all

      This would be the cleanest looking sollution..
      but will it work..
      I'll try this..
      Thanks everyone!

      /jocke
        Another "clean looking solution"!?
        ($_)= /^([A-Za-z0-9öäåÖÄÅ][A-Za-zöäåÖÄÅ0-9\s\-\.\,]*)$/ for ($address1 +, $address2);
        Works by aliasing the current variable to $_
Re: $1 don't reset (non loop)
by MCS (Monk) on Feb 17, 2004 at 13:23 UTC

    Why doesn't $1 reset when the regexp doesn't match anything?

    It's a feature, not a bug :-)

    Actually there are times when it comes in handy, if I'm doing stuff where I need it to be redefined I usually use an if {} elsif{} else{} loop. That way it only goes into the {} if the if (or elsif) matches. If you don't want to use an elsif (because you are checking a few things) just use multiple if statements.

      Well... this time I think it's a bug =)..
      It would've been great if you could define this so
      called "feature" with a special $ variable.

      Anyway .. now when I know about this "feature" and won't make the mistake again =)
      /jocke