The other day I had the following problem: scan a bunch of idl
files that look, simplified, like __DATA__ below, to identify each
interface or dispinterface declaration, and see whether it
has the attribute 'hidden' present, absent or commented out.
I ended up with the solution shown below, after much
head scratching and searching through Perl doc and books.
In my early attempts I did not have the two inner blocks -
all matches were done in the for loop block.
Naively I expected the $1 variable to be reinitialized
in each attempted match. Not so. When the first match
succeeded (found 'interface Isome') and the second failed
(did not find 'hidden'), $1 still contained the 'interface'
string from the first match, while I wanted it to be 'undef'.
Eventually I found these two perls of wisdom that saved my day:
perlre
The numbered variables ($1, $2, $3, etc.) and the related
punctuation set (<$+, $&, $`, and $') are all dynamically
scoped until the end of the enclosing block or until the
next successful match, whichever comes first.
(See Compound Statements in the perlsyn manpage.)
Effective Perl Programming, item 16
Memory variables are automatically localized by each new scope.
In a unique twist, the localized variables receive copies
of the values from the outer scope - this is in contrast
to the usual reinitializing of a localized variable
In addition, I realized that a 'next' without a label
takes me out of the immediate enclosing block, but
it needs a label, OUTER:, to get me out of the for loop.
Moral for a regex user:
if you have several successive
match operations that use the numbered variables,
the safe course is to isolate the matches in separate
blocks on the same level (isolating a match in an
inner block from a match in an outer block would
not work because of that 'unique twist').
Questions to the wise:
Why do the localized variables receive copies of
the values from the outer scope? Why are the variables not simply undef'd, like in the regular local operation?
Since these variables are read-only, there is no way to
undef them and thus erase the memory of a previous match.
Are there other tricks or techniques to get around this problem?
Rudif
#! perl -w
use strict;
my $text;
{
$/ = undef;
$text = <DATA>;
}
my @f = split /(?<=\n)([ \t]*[\[\]][ \t]*\n)/s, $text;
OUTER:
for my $i (0..$#f-2) {
{ # inner block 1
next OUTER unless $f[$i] =~ /^([ \t]*[\[][ \t]*)\n$/ &&
$f[$i+2] =~ /^([ \t]*[\]][ \t]*)\n$/ &&
$f[$i+3] =~ /((?:disp)?interface\s+\w+)/;
print STDERR "==$i== $1\n";
}
{ # inner block 2
$f[$i+1] =~ /(?<=\n)((\s*[\/]*\s*not)?\s+hidden)/;
my $h = defined $1 ? $1 : ' HIDDEN UNDEF';
print STDERR "==$i== $h\n";
}
}
__DATA__
[
uuid(078F04FD-B23E-11D3-80C3-00A024D42DAF),
// not hidden
]
dispinterface _IMgrEvents
{
};
[
object,
uuid(078F04EB-B23E-11D3-80C3-00A024D42DAF),
hidden
]
interface IMgr : IDispatch
{
[propget, id(201), HRESULT DebugInfo([out, retval] BSTR *pVal)
+;
};
[
uuid(078F04FE-B23E-11D3-80C3-00A024D42DAF),
]
dispinterface _IMgrEvents
{
};
[
object,
uuid(078F04EC-B23E-11D3-80C3-00A024D42DAF),
]
interface IMgr : IDispatch
{
[propget, id(201), HRESULT DebugInfo([out, retval] BSTR *pVal)
+;
};
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.