Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

unexpected behaviour of text::balanced

by Anonymous Monk
on Nov 24, 2012 at 13:23 UTC ( [id://1005372]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

The following prints et:a)(b eb:(a)(b) eteb1:a)(b eteb2: i expected it to print et:a)(b eb:(a)(b) eteb1:a)(b eteb2:(a)(b) were my expectations unexpected? use strict; use warnings; use File::Slurp; use Text::Balanced qw/extract_multiple extract_bracketed extract_tagge +d/; my $data = '(a)(b)'; et($data); eb($data); eteb($data); sub et { my $data = shift; my @array = extract_multiple( $data, [ sub{extract_tagged($_[0], 'a', 'b', undef,)}, ], undef, 1 ); display('et', @array) } sub eb { my $data = shift; my @array = extract_multiple( $data, [ sub{extract_bracketed($_[0], '()')}, ], undef, 1 ); display('eb', @array) } sub eteb { my $data = shift; my @array = extract_multiple( $data, [ sub{extract_tagged($_[0], 'a', 'b', undef,)}, ], undef, 1 ); display('eteb1', @array); @array = extract_multiple( $data, [ sub{extract_bracketed($_[0], '()')}, ], undef, 1 ); display('eteb2', @array); } sub display { my $sub = shift; print "$sub:"; print $_ for @_; print "\n"; }

Replies are listed 'Best First'.
Re: unexpected behaviour of text::balanced
by roboticus (Chancellor) on Nov 24, 2012 at 13:41 UTC

    That was surprising. It appears that Text::Balanced alters some of the magic innards of the variable. I changed your code a bit:

    sub eteb { my $data = shift; my $orig = $data; my @array = extract_multiple( $data, [ sub{extract_tagged($_[0], 'a', 'b', undef,)}, ], undef, 1 ); print "data='$data'\n"; display('eteb1', @array); @array = extract_multiple( $orig, + [ sub{extract_bracketed($_[0], '()')}, ], undef, 1 ); display('eteb2', @array); }

    And get the desired results. The funny thing is, I was expecting that $data would be empty after the call or something, but was surprised to see that the value looked unchanged. I then changed the second extract_multiple to:

    @array = extract_multiple( $data."", + [ sub{extract_bracketed($_[0], '()')}, ], undef, 1 );

    and it worked as you expect. I haven't read the Text::Balanced docs to see if it's expected behaviour or not. But if it isn't, you may want to file a bug report on it.

    Update: I remember a module (Devel::Peek) that lets you look at the magic goo inside of variables, so I changed your program to look at the $data variable before and after the call:

    sub eteb { my $data = shift; my $orig = $data; Dump($data); my @array = extract_multiple( $data, [ sub{extract_tagged($_[0], 'a', 'b', undef,)}, ], undef, 1 ); Dump($data); $data = $data.""; Dump($data); print "data='$data'\n"; + display('eteb1', @array); @array = extract_multiple( $data, [ sub{extract_bracketed($_[0], '()')}, ], undef, 1 ); display('eteb2', @array); } <c> <p>And sure enough, some stuff inside changed:</p> <c> $ perl 1005372.pl et:a)(b eb:(a)(b) SV = PV(0x8458478) at 0x84fe160 REFCNT = 1 FLAGS = (PADMY,POK,pPOK) PV = 0x83c47b8 "(a)(b)"\0 CUR = 6 LEN = 8 SV = PVMG(0x83ef3a8) at 0x84fe160 REFCNT = 7 FLAGS = (PADMY,SMG,POK,pPOK) IV = 0 NV = 0 PV = 0x83c47b8 "(a)(b)"\0 CUR = 6 LEN = 8 MAGIC = 0x83c2ec0 MG_VIRTUAL = &PL_vtbl_mglob MG_TYPE = PERL_MAGIC_regex_global(g) MG_LEN = 5 SV = PVMG(0x83ef3a8) at 0x84fe160 REFCNT = 7 FLAGS = (PADMY,SMG,POK,pPOK) IV = 0 NV = 0 PV = 0x83c47b8 "(a)(b)"\0 CUR = 6 LEN = 8 MAGIC = 0x83c2ec0 MG_VIRTUAL = &PL_vtbl_mglob MG_TYPE = PERL_MAGIC_regex_global(g) MG_LEN = -1 data='(a)(b)' eteb1:a)(b eteb2:(a)(b)

    After seeing this, I reviewed the docs for Text::Balanced, and noticed this:

    Note that in a list context, the contents of the original input text (the first argument) are not modified in any way. However, if the input text was passed in a variable, that variable's pos value is updated to point at the first character after the extracted text. That means that in a list context the various subroutines can be used much like regular expressions. For example:

    In short, it's supposed to do that. That way it's ready to pull out the *next* bits of balanced text for you. Appending a null to the end of the string simply resets the string.

    Sigh! Had I read the docs before playing with the code, I'd've saved myself a little time. Ah, well...

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

      Thanks for the explanation.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1005372]
Approved by McDarren
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (2)
As of 2024-04-20 05:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found