Re: Regular Expression Question

Replies are listed 'Best First'.
Re: Re: Regular Expression Question by danger (Priest) on Feb 28, 2001 at 22:25 UTC
Your method of enclosing the match operation only appears to work (in terms of leaving $1 unmodified after the sub call, and returning undef on failure), but not at all for the reasons you provide. The use of 'strict' has nothing to do with producing the undef value, and 'strict' has nothing to do with how variables are scoped. Had any successful match been applied in the outer scope prior to your sub calls, then $1 (which is global) would have been set there and its value would be the return value on any of your sub calls that failed to match. Check this minor variation on your example: `use strict; "blah" =~ /(a)/; # now we've set $1 at the global scope my $i = "mmmm9"; my $a = match_rtn( $i ); print $a, "\n"; $i = "mmm"; $a = match_rtn( $i ); print $a, "\n"; # ook! this isn't undefined! sub match_rtn { my $str = shift; $str =~ m/(\d+)/; return $1; }` [download] Match variables ($1, $2, etc.) are global variables. When match variables are set (due to a successful match operation), they are always localized to the enclosing block. So, they retain their value until another successful pattern match, or the end of the current block. Witness: `{ $_ = 'blah'; /(a)/; print "$1\n"; # prints: a } print "$1\n"; # unitialized warning` [download] Now try this longer example and you'll see that the match variables are implicitly localized (ie, in the sense of local()): `$_ = 'blah'; /(\w)/; print "$1\n"; # prints: b /(\d)/; print "$1\n"; # still prints: b { /(a)/; print "$1\n"; # prints: a } print "$1\n"; # prints: b` [download] The proper way to protect yourself from using unintended old values in $1 and friends is to program defensively and check if a pattern match succeeded before trying to use captured subexpressions (as previous messages in this thread have shown). I looked up exactly what 'strict' is supposed to do, and the Camel book says its supposed to disallow "unsafe" code. My question to anyone else is what is considered "unsafe"? Please see the documentation for strict and tye's review of strict.pm for starters.	[reply] [d/l] [select]
Re: Re: Re: Regular Expression Question by merlyn (Sage) on Feb 28, 2001 at 22:30 UTC
The proper way to protect yourself from using unintended old values in $1 and friends is to program defensively and check if a pattern match succeeded before trying to use captured subexpressions (as previous messages in this thread have shown). Well, that's a way. Another way is to stylistically outlaw all uses of `$1` et seq, except in the right side of a substitution. Any other "capturing" should be done as list-context assignment: `my ($first, $second) = $source =~ /blah(this)blah(that)blah/;` [download] Then it's very clear what the scope and origination of `$first` and `$second` are. -- Randal L. Schwartz, Perl hacker	[reply] [d/l]
Re: Re: Re: Re: Regular Expression Question by danger (Priest) on Feb 28, 2001 at 23:29 UTC
Indeed, that's true enough -- I was trying to make a more general statement but failed to sufficiently generalize away from the specific problem of resuing old $1 etc. The more general point being to check the success of an operation before doing things that depend on the operation having succeeded. Thus, even when doing a list assignment of subexpressions from a match, one will generally want to test that the match succeeded prior to using the results: `if(my ($first, $second) = $source =~ /blah(this)blah(that)blah/){ print "$first $second\n"; }` [download]	[reply] [d/l]
Re: Re: Regular Expression Question by chipmunk (Parson) on Mar 01, 2001 at 00:40 UTC
I looked up exactly what 'strict' is supposed to do, and the Camel book says its supposed to disallow "unsafe" code. My question to anyone else is what is considered "unsafe"? I don't know whether you're looking at Camel II or III, but they both provide full documentation on strict. The Second edition documents the strict pragma beginning on page 500, and the Third edition beginning on page 858. Of course, the Camel book is not the only source of documentation. Each core module and pragma also comes with built-in documentation. You can read the standard documentation for strict with `perldoc strict` (or the equivalent on Windows [the HTML-ized docs] or Mac [Shuck]). The docs from 5.005_02 are even available on this site, including the docs for strict. To answer your question, three things are considered 'unsafe'. Each one is controlled by a separate part of strict. strict 'refs' prevents the use of symbolic references. strict 'vars' prevents the use of variables which are not pre-declared or fully qualified. strict 'subs' prevents the use of barewords.	[reply] [d/l]
Re: Re: Re: Regular Expression Question by dsb (Chaplain) on Mar 01, 2001 at 00:47 UTC
That much I got(didn't mean to sound like a dkhead). What I don't get is why that is unsafe. Is it because bareword could be confused with a built in function or something like that? Why are undeclared variables unsafe? Same kind of thing? Excuse my ignorance. I am starting to really get interested in the theories of programming and things like that and I would really like to get a handle on this stuff. I taught myself Perl about a year ago and I've had no one to ask these questions to. They are all sort of flooding out now. Thanks for your help. Amel - f.k.a.** - kel	[reply]
strict; why these practices are unsafe by chipmunk (Parson) on Mar 01, 2001 at 01:05 UTC
I see, I misunderstood your question. My apologies. One of the main reasons these practices are unsafe is that they often occur by accident. Here are some quick examples of mistakes that will cause errors with strict: `my $variable = 7; print $varaible; # typo # error from strict 'vars' my $hash = { name => 'value' }; my $key = 'name'; print $key->{'name'} # derefenced wrong variable # error from strict 'refs' sub subroutine { } print subrotine; # typo # errors from strict 'subs'` [download] Using strict also encourages good coding practices such as controlling the scope of your variables (strict 'vars') and using hard references instead of soft references (strict 'refs').	[reply] [d/l]
(tye)Re: Regular Expression Question by tye (Sage) on Mar 01, 2001 at 00:57 UTC
In strict.pm I go over why barewords can be a problem. I don't agree with the term "unsafe" here. strict.pm helps Perl find simple mistakes for you. Undeclared variables are only a problem in that they prevent Perl from (reliably) telling you when you put a typo in a variable name. - tye (but my friends call me "Tye")	[reply]