bwgoudey has asked for the wisdom of the Perl Monks concerning the following question:

I'm using parsers which I derived from HTML::Parser. It works fine if I only need one. But if I use two or more at once, like in the code below, the information from the first parser is now identical to the first
my $old_text = new MUCParser; $old_text->parse("old_file.html"); print "before\n" print $old_text->num_of_tags()."\n"; my $new_text = new MUCParser; $new_text->parse("new_file.html"); print "after\n"; print $old_text->num_of_tags()."\n"; print $new_text->num_of_tags()."\n";
Output
before 602 after 397 397
Its like they are writing to the same space in memory but I am not sure why this would happen. Any ideas?

Replies are listed 'Best First'.
Re: Parsers overwriting each other
by Corion (Patriarch) on Jul 21, 2007 at 08:28 UTC

    I guess the error comes from your class MUCParser, possibly in the subroutine num_of_tags, because HTML::Parser works for me independently. But HTML::Parser itself does not store any state, especially not anything like a tag count, so it is mostly interesting what your num_of_tags subroutine does and how/where it stores the tag count. If for example your MUCParser class looks like the following:

    package MUCParser; use strict; use base 'HTML::Parser'; use vars '$tag_count'; sub new { my ($class,@args) = @_; ... my $self = $class->SUPER::new(@args); }; sub num_of_tags { return $tag_count; };

    then, $tag_count is a global variable and all your MUCParser instances will share that variable. Some more information, like the (stripped down, relevant) source code of MUCParser is needed.

      Perhaps declaring my counter as a global is my problem. My code is quite similar to the example you gave, but mine is perhaps even more simple/naive. Here is my code
      use strict; use warnings; package MUCParser; use base "HTML::Parser"; my $num_of_tags; sub start { $tag_count+=1; } sub num_of_tags { return $tag_count; };
      I had assumed that declaring my tag_counter like that still meant that each instance had that. A quick google (now that I realise my problem) shows that the proper way to store my variables in the self hash reference. Sound like a better way of doing things? Time to test and see. Thanks heaps.