Vonunov has asked for the wisdom of the Perl Monks concerning the following question:

v@vonunov ~/perl$ perl -v
This is perl 5, version 16, subversion 3 (v5.16.3) built for amd64-freebsd-thread-multi

Hi all,

I don't really "know" perl, I just cobble things together by reading examples and documentation, promptly forgetting how and why I did everything unless I heavily commented it, so it could be that I'm only missing something very basic.

The error in the title, '"Can't locate object method "getElementsByClassName" via package "HTML::TagParser::Element"', appears only after making a couple of small changes to a previously working script. First, the working version with usage demo:

#!/usr/bin/perl -w use strict; use HTML::TagParser; use URI::Fetch; # Take list of URLs like # http://everything2.com/user/ameriwire/writeups # and extract specific writeup URLs: "(thing)" # (Have to manually add multiple pages of WUs) my $infile = $ARGV[0]; # Give URL list file in first arg my $outfile = $ARGV[1]; # Give output file in second arg my $outfh; open (my $infh, '<', $infile) or die "Could not open file '$infile' $! +"; while (my $line = <$infh>) { chomp ($line); my $class = "type"; # .type my $html = HTML::TagParser->new($line); #Fetch+parse HTML file my @elem = $html->getElementsByClassName($class); #Grab each instance +of .type into array # <span class="type">(<a # href="/user/ameriwire/writeups/W.+Mark+Felt">person</a>)</span> foreach (@elem) { # iterate through array my $child = $_->firstChild(); # = <a> under <span> my $ahref = $child->getAttribute("href"); # return value of attrib hre +f my $wup = "http://everything2.com" . $ahref . "\n"; # "writeup" print "http://everything2.com" . $ahref . "\n"; open ($outfh, '>>', $outfile) or die "Could not open file '$outfile' $ +!"; print $outfh $wup; # Text to file close $outfh; print "Wrote to " . $outfile . "\n"; } }
[v@vonunov ~/perl]$ cat infile.txt http://everything2.com/user/ameriwire/writeups [v@vonunov ~/perl]$ ./get-wus.pl infile.txt outfile.txt http://everything2.com/user/ameriwire/writeups/diverticulosis Wrote to outfile.txt http://everything2.com/user/ameriwire/writeups/W.+Mark+Felt Wrote to outfile.txt http://everything2.com/user/ameriwire/writeups/moral+law Wrote to outfile.txt http://everything2.com/user/ameriwire/writeups/altruism Wrote to outfile.txt [etc.] [v@vonunov ~/perl]$ head outfile.txt http://everything2.com/user/ameriwire/writeups/diverticulosis http://everything2.com/user/ameriwire/writeups/W.+Mark+Felt http://everything2.com/user/ameriwire/writeups/moral+law http://everything2.com/user/ameriwire/writeups/altruism

Now here's the script that fails. Notice the area around "my $html"; now we're getting #mainbody and *then* finding .type within it. This is to avoid taking in unwanted content that's kept in a sidebar area on the target site.

#!/usr/bin/perl -w use strict; use HTML::TagParser; use URI::Fetch; # Take list of URLs like # http://everything2.com/user/ameriwire/writeups # and extract specific writeup URLs: "(thing)" # (Have to manually add multiple pages of WUs) my $infile = $ARGV[0]; # Give URL list file in first arg my $outfile = $ARGV[1]; # Give output file in second arg my $outfh; open (my $infh, '<', $infile) or die "Could not open file '$infile' $! +"; while (my $line = <$infh>) { chomp ($line); my $class = "type"; # .type my $id = "mainbody"; # #mainbody my $html = HTML::TagParser->new($line); #Fetch+parse HTML file my $body = $html->getElementById($id); #^If we don't do this we get sidebar WUs too my @elem = $body->getElementsByClassName($class); #Grab each instance +of .type into array # <span class="type">(<a # href="/user/ameriwire/writeups/W.+Mark+Felt">person</a>)</span> foreach (@elem) { # iterate through array my $child = $_->firstChild(); # = <a> under <span> my $ahref = $child->getAttribute("href"); # return value of attrib hre +f my $wup = "http://everything2.com" . $ahref . "\n"; # "writeup" print "http://everything2.com" . $ahref . "\n"; open ($outfh, '>>', $outfile) or die "Could not open file '$outfile' $ +!"; print $outfh $wup; # Text to file close $outfh; print "Wrote to " . $outfile . "\n"; } }
[v@vonunov ~/perl]$ ./get-wus-bad.pl infile.txt outfile.txt Can't locate object method "getElementsByClassName" via package "HTML: +:TagParser::Element" at ./get-wus-bad.pl line 31, <$infh.

(It isn't printing the ">" after infh?)

After re-re-reading HTML::TagParser I'm still not sure why it's doing this. But see how it says: via package "HTML::TagParser::Element"? If you look at the documentation, getElementsByClassName is not inside "HTML::TagParser::Element SUBCLASS". I don't see anything before my use of getElementByClassName that would have put us in the context of that subclass, so to speak, if that is a thing that happens.

My use doesn't line up exactly with the documentation, I think. They do "$html = HTML::TagParser->new( $file );" then "@elem = $html->getElementsByClassName( $class );" -- which works for me too -- but is the problem that I'm trying to also put it through another of these functions in turn, or is it something else?

This script is only for generating a file to feed into another script, so I don't need to get it perfect, only working, but if you have any advice at all, it's welcome.

Thanks.

Edit:

With use diagnostics;

[v@vonunov ~/perl]$ ./get-wus-bad.pl infile.txt outfile.txt Can't locate object method "getElementsByClassName" via package "HTML::TagParser::Element" at ./get-wus-bad.pl line 32, <$infh +> line 1 (#1) (F) You called a method correctly, and it correctly indicated a pa +ckage functioning as a class, but that package doesn't define that parti +cular method, nor does any of its base classes. See perlobj. Uncaught exception from user code: Can't locate object method "getElementsByClassName" via packag +e "HTML::TagParser::Element" at ./get-wus-bad.pl line 32.

Still lost.

  • Comment on Can't locate object method "getElementsByClassName" via package "HTML::TagParser::Element"
  • Select or Download Code

Replies are listed 'Best First'.
Re: Can't locate object method "getElementsByClassName" via package "HTML::TagParser::Element"
by tangent (Parson) on Nov 27, 2014 at 03:43 UTC
    I don't see anything before my use of getElementByClassName that would have put us in the context of that subclass
    I do ;-) In your first example you have:
    my $html = HTML::TagParser->new($line); my @elem = $html->getElementsByClassName($class);
    @elem now contains a list of HTML::TagParser::Element objects
    In your second example you have:
    my $html = HTML::TagParser->new($line); my $body = $html->getElementById($id); my @elem = $body->getElementsByClassName($class);
    Here $body is a single HTML::TagParser::Element object, and you can't call getElementsByClassName on that object. Try this (untested):
    my $html = HTML::TagParser->new($line); my $body = $html->getElementById($id); $body = $body->subTree(); # $body is now a new HTML::TagParser object my @elem = $body->getElementsByClassName($class);

      Nice job zentara, here is a monkey patch

      BEGIN { package HTML::TagParser::Element; sub AUTOLOAD { my ($name) = our $AUTOLOAD =~ /::(\w+)$/; my $method = sub { my $self = shift; return $self->subTree->$name( @_ ); }; no strict 'refs'; * { $AUTOLOAD } = $method; goto &$method; } }

      Now calling getElementsByClassName on a HTML::TagParser::Element works :)

        Sorry about that tangent .. when I get distracted tangent associates to zentara and bam ... kinda miss the guy :)

      Cool, I see how that works now. Your suggestion does the trick. I think I almost had something working with HTML::TreeBuilder, but as this script is auxiliary and it's getting me what I need, I think I'll leave well enough alone.

Re: Can't locate object method "getElementsByClassName" via package "HTML::TagParser::Element"
by Anonymous Monk on Nov 26, 2014 at 19:39 UTC

      I worried that if I tried to trim anything I would cut out the part that magically makes my problem happen even though it looks like it has nothing to do with it. :P

      Thanks, I'll try that module then.

        Making the effort of trimming your code to a more reasonable length and still exhibit the problem might actually help you figuring out by yourself where the error is. And it will make it easier for us to help you if you did not solve the problem yourself in the course of that process.

        I worried that if I tried to trim anything I would cut out the part that magically makes my problem happen even though it looks like it has nothing to do with it. :P

        Thats why you make a copy of the file each time you trim, that way, if the problem disappears, you can go back a file ;)