comment on

The other day I had the following problem: scan a bunch of idl files that look, simplified, like __DATA__ below, to identify each interface or dispinterface declaration, and see whether it has the attribute 'hidden' present, absent or commented out.

I ended up with the solution shown below, after much head scratching and searching through Perl doc and books. In my early attempts I did not have the two inner blocks - all matches were done in the for loop block.

Naively I expected the $1 variable to be reinitialized in each attempted match. Not so. When the first match succeeded (found 'interface Isome') and the second failed (did not find 'hidden'), $1 still contained the 'interface' string from the first match, while I wanted it to be 'undef'.

Eventually I found these two perls of wisdom that saved my day:

perlre
The numbered variables ($1, $2, $3, etc.) and the related punctuation set (<$+, $&, $`, and $') are all dynamically scoped until the end of the enclosing block or until the next successful match, whichever comes first. (See Compound Statements in the perlsyn manpage.)

Effective Perl Programming, item 16
Memory variables are automatically localized by each new scope. In a unique twist, the localized variables receive copies of the values from the outer scope - this is in contrast to the usual reinitializing of a localized variable

In addition, I realized that a 'next' without a label takes me out of the immediate enclosing block, but it needs a label, OUTER:, to get me out of the for loop.

Moral for a regex user:
if you have several successive match operations that use the numbered variables, the safe course is to isolate the matches in separate blocks on the same level (isolating a match in an inner block from a match in an outer block would not work because of that 'unique twist').

Questions to the wise:
Why do the localized variables receive copies of the values from the outer scope? Why are the variables not simply undef'd, like in the regular local operation? Since these variables are read-only, there is no way to undef them and thus erase the memory of a previous match.
Are there other tricks or techniques to get around this problem?

Rudif


#! perl -w

use strict;

my $text;
{
    $/ = undef;
    $text = <DATA>;
}

my @f = split /(?<=\n)([ \t]*[\[\]][ \t]*\n)/s, $text;

OUTER:
for my $i (0..$#f-2) {
    {    # inner block 1
        next OUTER unless $f[$i] =~ /^([ \t]*[\[][ \t]*)\n$/ && 
                          $f[$i+2] =~ /^([ \t]*[\]][ \t]*)\n$/ &&
                          $f[$i+3] =~ /((?:disp)?interface\s+\w+)/;
        print STDERR "==$i==  $1\n";
    }
    {    # inner block 2
        $f[$i+1] =~ /(?<=\n)((\s*[\/]*\s*not)?\s+hidden)/;
        my $h = defined $1 ? $1 : '    HIDDEN UNDEF';
        print STDERR "==$i==  $h\n";
    }
}


__DATA__


    [
        uuid(078F04FD-B23E-11D3-80C3-00A024D42DAF),
        // not hidden
    ]
    dispinterface _IMgrEvents
    {
    };

    [
        object,
        uuid(078F04EB-B23E-11D3-80C3-00A024D42DAF),
        hidden
    ]
    interface IMgr : IDispatch
    {
        [propget, id(201), HRESULT DebugInfo([out, retval] BSTR *pVal)
+;
    };

    [
        uuid(078F04FE-B23E-11D3-80C3-00A024D42DAF),
    ]
    dispinterface _IMgrEvents
    {
    };

    [
        object,
        uuid(078F04EC-B23E-11D3-80C3-00A024D42DAF),
    ]
    interface IMgr : IDispatch
    {
        [propget, id(201), HRESULT DebugInfo([out, retval] BSTR *pVal)
+;
    };
[download]

In reply to Scoping the regex memory variables and where do I go next by Rudif

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.