jesuashok has asked for the wisdom of the Perl Monks concerning the following question:

respected monks,

#!/usr/bin/perl #use strict; my $line; while ( <DATA> ) { chomp; $line = $_; $line =~ /\s+<\w+\/?>(.*)<\/\w+>/; $line = $1; print "$line\n"; } __DATA__ <Table>First_Table</Table> <Table/> Output :- First_Table First_Table
In the above code the second line should not have any value in $1. but $1 still maintains the value which grabs in the previous line. how can I make the $1 to be refreshed in each loop ?


__Signature__
if [jesuashok]; --$exp; __END__

Replies are listed 'Best First'.
Re: how to empty the built in variable
by GrandFather (Saint) on Feb 22, 2007 at 04:22 UTC

    Change the regex line to:

    next unless $line =~ /\s+<\w+\/?>(.*)<\/\w+>/;

    If the regex fails then $1 is not altered (as you noticed).

    Why did you comment out use strict; btw? Your code is fine with strictures enabled.


    DWIM is Perl's answer to Gödel
Re: how to empty the built in variable
by bobf (Monsignor) on Feb 22, 2007 at 04:23 UTC

    You are not checking if the regex matched before you use the value of $1.

    Changing the guts of your code to the following may give you the desired result.

    if( $line =~ /\s+<\w+\/?>(.*)<\/\w+>/ ) { $line = $1; print "$line\n"; } else { print "no match\n"; }
    Output:
    First_Table no match

    Finally, why did you comment out use strict, and if you're parsing what appears to be XML or HTML, why not use a parser?

Re: how to empty the built_in variable
by blazar (Canon) on Feb 22, 2007 at 09:24 UTC
    my $line; while ( <DATA> ) { chomp; $line = $_; $line =~ /\s+<\w+\/?>(.*)<\/\w+>/; $line = $1; print "$line\n"; }

    In addition to the other comments you got, you should as usual declare your lexical variables in the innermost scope as possible; in this case:

    while ( <DATA> ) { chomp; my $line = $_; # ...

    But... all in all it's strange that you use the implicit $_ only to assign it to $line. You either want

    while ( <DATA> ) { chomp; /\s+<\w+\/?>(.*)<\/\w+>/ or next; print $1, "\n"; }

    or

    while ( my $line=<DATA> ) { chomp $line; $line =~ /\s+<\w+\/?>(.*)<\/\w+>/ or next; print $1, "\n"; }
Re: how to empty the built_in variable
by davorg (Chancellor) on Feb 22, 2007 at 09:37 UTC
    In the above code the second line should not have any value in $1

    Actually, that's not true. The behaviour that you are seeing is documented in perlre.

    NOTE: failed matches in Perl do not reset the match variables, which makes it easier to write code that tests for a series of more specific cases and remembers the best match.
Re: how to empty the built_in variable
by holli (Abbot) on Feb 22, 2007 at 17:06 UTC
    Like Moron said, parsing XML with regexes is a pita and error prone. The code below uses XML::XPath to do the same job yours does. And besides from being safer it's also more readable (and more perlish :-).

    Note: I added a root element to the data so it becomes valid XML.
    use strict; use warnings; use XML::XPath; my $xp = XML::XPath->new(ioref => *DATA); #for parsing files: #my $xp = XML::XPath->new(filename => 'test.xml'); print map { $_->string_value, "\n" } grep { $_->string_value } $xp->find('/Root/Table')->get_nodelist; __DATA__ <Root> <Table>First_Table</Table> <Table/> <Table>Second_Table</Table> </Root>
    Ouputs:
    First_Table Second_Table


    holli, /regexed monk/
Re: how to empty the built_in variable
by Moron (Curate) on Feb 22, 2007 at 13:23 UTC
    I suspect you commented out the use strict because you were getting undefined data errors -- in your code $1 is undefined whenever the regexp fails to match.

    Functionally it looks like you want two loops rather than one. The outer loop should poll the lines of input and the inner loop should exhaustively parse the line. In addition it is normal and advisable to use one regexp per lexical element in a parser, which changes everything of course, so you might as well use something like XML::Twig instead :)

    -M

    Free your mind