perl-diddler has asked for the wisdom of the Perl Monks concerning the following question:
Have a quick question, the answer for which seems to be obvious, but just wanting to check and maybe hope I'm wrong.
#!/usr/bin/perl
use strict; use warnings;
use P;
my $intxt;
$intxt = << 'TXT' ;
<package type="rpm">
<name>7kaa-music</name>
<url>http://7kfans.com/</url>
TXT
;
sub REindex($$;$) { #like 'index', except substr is RE
my ($str,$ss)=(shift, shift);
my $p = @_ ? shift:0;
$str =~ m{^.{$p,$p}($ss+)} ? length $1 : -1;
}
my @lines=split "\n", $intxt;
my $ln;
my $lineno=0;
sub getln() {
return $lineno<@lines ? $lines[$lineno++] : undef;
}
my $ttag;
sub getnxt_tagln(); local * getnxt_tagln;
*getnxt_tagln = sub () {
do {
$_=getln();
defined $_ or return undef;
} until m{^\s*<(/?\w+)};
$ttag=$1;
};
my $tag;
NXTPKG:
while (getnxt_tagln()) { # why '$1' null?
$ln = $_; $tag = $1;
Pe "_=%s, ttag=%s, tag=%s", $_, $ttag, $tag;
}
# vim: ts=2 sw=2 ai number
My question concerns the comment after the NXTPKG line: why '$1' null (∄)?
When I run this:
_=<package type="rpm">, ttag=package, tag=∄;
_= <name>7kaa-music</name>, ttag=name, tag=∄;
_= <url>http://7kfans.com/</url>, ttag=url, tag=∄;
tag is null/undef when I get out of my inline-sub. I can get around it by assigning $1 to $ttag, but I don't have any other Regex's that should be clearing '$1'. Seems a bit weird to have the end of
a local sub clear '$1', yet that seems to be what is happening. Why? What was the logic of forcing/doing that?
tnx!
Re: why is $1 cleared at end of an inline sub?
by haukex (Archbishop) on Sep 16, 2021 at 12:38 UTC
|
sub do_something_else {
my $foo = "quzbaz";
$foo =~ /([aeiou]+)/
and print ">$1\n"; # prints ">u"
}
my $bar = "foobar";
if ( $bar =~ /([aeiou]+)/ ) {
do_something_else();
print "$1\n"; # still prints "oo", not "u"
}
As a general rule, regular expression variables that you want to keep for later use should be copied into other variables ASAP, and only if the match was successful.
Update: Better link instead of local | [reply] [d/l] [select] |
|
Both of these answers ignored that the sub was an 'anonymous'/'inline' sub that would have access to surrounding local variables in the same scope, including regex vars.
I find it surprising that '$1' is affected in this way by an anonymous sub.
I wouldn't find it surprising that a normal sub would auto-save context to not
completely disrupt callers (even though '$_' needs to be explicitly saved with local).
To re-ask, why is an inline-sub which I thought was designed to have access
to local vars (in same context) restoring '$1'. If it was accessing or changing
'$_1', it would access the copy of the sub it was in. I had supposed that the
'$1' would stay constant until another regex and that an inline/anon sub wouldn't
treat '$1' differently from '$_1'.
I was really more wondering what the rational might be for treating them differently
in an anon/inline sub.
In the same way I find that '$1', and '$2' are cleared coming out of a 'do' block
to be strange -- I would have thought only another regex would change them.
my $s="abcdefg";
$_=$s;
my @res=do { m{abc(de)(fg)}; };
P "nres=%s 1=%s, 2=%s", 0+@res, $1, $2;
'
nres=2 1=∄, 2=∄
| [reply] |
|
Both of these answers ignored that the sub was an 'anonymous'/'inline' sub
I "ignored" it because it makes no difference, a sub is a sub no matter whether it has an entry in the symbol table or not. (Update: There are small differences, e.g. how a sub call is parsed depending on when the compiler sees the definition, but that's not relevant to this thread.)
... that would have access to surrounding local variables in the same scope, including regex vars.
Sorry, but that's not how dynamic scoping works. It might help to forget about lexical scope entirely for a moment, and to think of it in terms of the call stack: it's like local stores the current value of the variable onto a stack, and exiting the currently executing scope (sub, do, etc.) restores the saved value. This happens during runtime, hence the "dynamic". Also note that dynamic scoping only works for package variables, not lexicals (my).
I had supposed that the '$1' would stay constant until another regex... I would have thought only another regex would change them.
Yes, the implicit dynamic scoping can be little surprising like that, but once you get the hang of dynamic scoping, it should make sense. I showed with my example above why it makes sense to do it that way for regex variables.
| [reply] [d/l] [select] |
|
"normal subs" are just named "anonymous subs", there is not much more difference.
consider
DB<7> *beyonce = sub { print "say my name, say my name" }
DB<8> beyonce()
say my name, say my name
DB<9>
this also works the other way round, you can read the sub-ref of a named sub and than destroy the name in the packages STASH:
DB<21> sub kelly { print "say my name, say my name" }
DB<22> $anosub = \&kelly
DB<23> delete $main::{kelly}
DB<24> $anosub->()
say my name, say my name
DB<25> kelly()
Undefined subroutine &main::kelly called at (eval 34)[c:/Strawberry/pe
+rl/lib/perl5db.pl:738] line 2.
So where do you want to draw the line???
side-note
there are though block-compounds in Perl which can be confused with anonymous subs.
Maybe that's your misunderstanding, if you talk about "inlined subs" °?
for instance map-blocks are not ano-subs effecting return
DB<19> sub tst { map { return $_ } 42..1e6 ; return "never executed"
+ }
DB<20> p tst()
42
DB<21>
But those map-like constructs in List::Util are implemented with ano-subs and won't allow returning from outer subs!
°) what does that even mean? | [reply] [d/l] [select] |
|
The linked documentation explains it. Dynamic scope propagates inside blocks, but not outside.
my $s = 'abcdefg';
$s =~ m{abc(de)(fg)};
my $output = do { sprintf "1=%s, 2=%s", $1, $2 };
print $output; # 1=de, 2=fg
map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
| [reply] [d/l] [select] |
|
maybe haukex understood your problem and you need to get acquainted with dynamic-scoping ... which is the only possible way to have limited control over global variables.
NB: our package-vars and special vars like $1 are global. They are accessible everywhere at run-time and prone to "sabotage".
Static aka lexical scoping is a totally different beast for my vars at compile-time.
Try to debug a global variable which suddenly changes after you called a sub from a foreign module you just upgraded.
And special vars are not protected by namespaces, they are all in main:: !
That's why they are automatically localized in subs.
Dynamic scoping was already a given in Perl4, which had no such thing like my or lexical scoping.
| [reply] [d/l] |
Re: why is $1 cleared at end of an inline sub?
by LanX (Saint) on Sep 16, 2021 at 12:42 UTC
|
TL;DR all
next time please condense it to the relevant part!
> Seems a bit weird to have the end of a local sub clear '$1', yet that seems to be what is happening
yes, easily shown in a SSCCE
DB<3> sub bla { "XXX"=~/(X*)/; print "inside $1" }
DB<4> bla; print "outside $1"
inside XXXoutside
DB<5>
> What was the logic of forcing/doing that?
I'd say it's about localizing the inner sub to protect all caller levels from effects at a distance, consider
DB<5> "YYY"=~/(Y*)/; bla; print "old $1"
inside XXXold YYY
DB<6>
otherwise nobody could rely on $1 etc anymore after calling a random sub.
Using a dedicated closure var holding the copied content of $1 is the way to go in your use case.
| [reply] [d/l] [select] |
A reply falls below the community's threshold of quality. You may see it by logging in. |
|
|