Can't locate object method "getElementsByClassName" via package "HTML::TagParser::Element"

Vonunov has asked for the wisdom of the Perl Monks concerning the following question:

v@vonunov ~/perl$ perl -v
This is perl 5, version 16, subversion 3 (v5.16.3) built for amd64-freebsd-thread-multi

Hi all,

I don't really "know" perl, I just cobble things together by reading examples and documentation, promptly forgetting how and why I did everything unless I heavily commented it, so it could be that I'm only missing something very basic.

The error in the title, '"Can't locate object method "getElementsByClassName" via package "HTML::TagParser::Element"', appears only after making a couple of small changes to a previously working script. First, the working version with usage demo:

#!/usr/bin/perl -w
use strict; 
use HTML::TagParser;
use URI::Fetch;

# Take list of URLs like 
# http://everything2.com/user/ameriwire/writeups 
# and extract specific writeup URLs: "(thing)"
# (Have to manually add multiple pages of WUs)

my $infile = $ARGV[0]; 
# Give URL list file in first arg

my $outfile = $ARGV[1];
# Give output file in second arg

my $outfh;

open (my $infh, '<', $infile) or die "Could not open file '$infile' $!
+";

while (my $line = <$infh>) {

chomp ($line);

my $class = "type"; # .type
my $html = HTML::TagParser->new($line); #Fetch+parse HTML file
my @elem = $html->getElementsByClassName($class); #Grab each instance 
+of .type into array

# <span class="type">(<a
# href="/user/ameriwire/writeups/W.+Mark+Felt">person</a>)</span>

foreach (@elem) { # iterate through array
my $child = $_->firstChild(); # = <a> under <span>
my $ahref = $child->getAttribute("href"); # return value of attrib hre
+f
my $wup = "http://everything2.com" . $ahref . "\n"; # "writeup"
print "http://everything2.com" . $ahref . "\n";

open ($outfh, '>>', $outfile) or die "Could not open file '$outfile' $
+!";
print $outfh $wup; # Text to file
close $outfh;
print "Wrote to " . $outfile . "\n";

}
}
[download]

[v@vonunov ~/perl]$ cat infile.txt 
http://everything2.com/user/ameriwire/writeups

[v@vonunov ~/perl]$ ./get-wus.pl infile.txt outfile.txt
http://everything2.com/user/ameriwire/writeups/diverticulosis
Wrote to outfile.txt
http://everything2.com/user/ameriwire/writeups/W.+Mark+Felt
Wrote to outfile.txt
http://everything2.com/user/ameriwire/writeups/moral+law
Wrote to outfile.txt
http://everything2.com/user/ameriwire/writeups/altruism
Wrote to outfile.txt

[etc.]

[v@vonunov ~/perl]$ head outfile.txt 
http://everything2.com/user/ameriwire/writeups/diverticulosis
http://everything2.com/user/ameriwire/writeups/W.+Mark+Felt
http://everything2.com/user/ameriwire/writeups/moral+law
http://everything2.com/user/ameriwire/writeups/altruism
[download]

Now here's the script that fails. Notice the area around "my $html"; now we're getting #mainbody and *then* finding .type within it. This is to avoid taking in unwanted content that's kept in a sidebar area on the target site.

#!/usr/bin/perl -w
use strict; 
use HTML::TagParser;
use URI::Fetch;

# Take list of URLs like 
# http://everything2.com/user/ameriwire/writeups 
# and extract specific writeup URLs: "(thing)"
# (Have to manually add multiple pages of WUs)

my $infile = $ARGV[0]; 
# Give URL list file in first arg

my $outfile = $ARGV[1];
# Give output file in second arg

my $outfh;

open (my $infh, '<', $infile) or die "Could not open file '$infile' $!
+";

while (my $line = <$infh>) {

chomp ($line);

my $class = "type"; # .type
my $id = "mainbody"; # #mainbody
my $html = HTML::TagParser->new($line); #Fetch+parse HTML file
my $body = $html->getElementById($id); 
#^If we don't do this we get sidebar WUs too
my @elem = $body->getElementsByClassName($class); #Grab each instance 
+of .type into array

# <span class="type">(<a
# href="/user/ameriwire/writeups/W.+Mark+Felt">person</a>)</span>

foreach (@elem) { # iterate through array
my $child = $_->firstChild(); # = <a> under <span>
my $ahref = $child->getAttribute("href"); # return value of attrib hre
+f
my $wup = "http://everything2.com" . $ahref . "\n"; # "writeup"
print "http://everything2.com" . $ahref . "\n";

open ($outfh, '>>', $outfile) or die "Could not open file '$outfile' $
+!";
print $outfh $wup; # Text to file
close $outfh;
print "Wrote to " . $outfile . "\n";

}
}
[download]

[v@vonunov ~/perl]$ ./get-wus-bad.pl infile.txt outfile.txt
Can't locate object method "getElementsByClassName" via package "HTML:
+:TagParser::Element" at ./get-wus-bad.pl line 31, <$infh.
[download]

(It isn't printing the ">" after infh?)

After re-re-reading HTML::TagParser I'm still not sure why it's doing this. But see how it says: via package "HTML::TagParser::Element"? If you look at the documentation, getElementsByClassName is not inside "HTML::TagParser::Element SUBCLASS". I don't see anything before my use of getElementByClassName that would have put us in the context of that subclass, so to speak, if that is a thing that happens.

My use doesn't line up exactly with the documentation, I think. They do "$html = HTML::TagParser->new( $file );" then "@elem = $html->getElementsByClassName( $class );" -- which works for me too -- but is the problem that I'm trying to also put it through another of these functions in turn, or is it something else?

This script is only for generating a file to feed into another script, so I don't need to get it perfect, only working, but if you have any advice at all, it's welcome.

Thanks.

Edit:

With use diagnostics;

[v@vonunov ~/perl]$ ./get-wus-bad.pl infile.txt outfile.txt
Can't locate object method "getElementsByClassName" via package
        "HTML::TagParser::Element" at ./get-wus-bad.pl line 32, <$infh
+> line 1 (#1)
    (F) You called a method correctly, and it correctly indicated a pa
+ckage
    functioning as a class, but that package doesn't define that parti
+cular
    method, nor does any of its base classes.  See perlobj.
    
Uncaught exception from user code:
        Can't locate object method "getElementsByClassName" via packag
+e "HTML::TagParser::Element" at ./get-wus-bad.pl line 32.
[download]

Still lost.

Comment on Can't locate object method "getElementsByClassName" via package "HTML::TagParser::Element" Select or Download Code

Replies are listed 'Best First'.
Re: Can't locate object method "getElementsByClassName" via package "HTML::TagParser::Element" by tangent (Parson) on Nov 27, 2014 at 03:43 UTC
I don't see anything before my use of getElementByClassName that would have put us in the context of that subclass I do ;-) In your first example you have: `my $html = HTML::TagParser->new($line); my @elem = $html->getElementsByClassName($class);` [download] @elem now contains a list of HTML::TagParser::Element objects In your second example you have: `my $html = HTML::TagParser->new($line); my $body = $html->getElementById($id); my @elem = $body->getElementsByClassName($class);` [download] Here $body is a single HTML::TagParser::Element object, and you can't call getElementsByClassName on that object. Try this (untested): `my $html = HTML::TagParser->new($line); my $body = $html->getElementById($id); $body = $body->subTree(); # $body is now a new HTML::TagParser object my @elem = $body->getElementsByClassName($class);` [download]	[reply] [d/l] [select]
Re^2: Can't locate object method "getElementsByClassName" via package "HTML::TagParser::Element" ( getElementsByClassName and others for HTML::TagParser::Element ) by Anonymous Monk on Nov 27, 2014 at 08:42 UTC
Nice job zentara, here is a monkey patch `BEGIN { package HTML::TagParser::Element; sub AUTOLOAD { my ($name) = our $AUTOLOAD =~ /::(\w+)$/; my $method = sub { my $self = shift; return $self->subTree->$name( @_ ); }; no strict 'refs'; * { $AUTOLOAD } = $method; goto &$method; } }` [download] Now calling getElementsByClassName on a HTML::TagParser::Element works :)	[reply] [d/l]
Re^3: Can't locate object method "getElementsByClassName" via package "HTML::TagParser::Element" by Anonymous Monk on Nov 27, 2014 at 08:45 UTC
Sorry about that tangent .. when I get distracted tangent associates to zentara and bam ... kinda miss the guy :)	[reply]
Re^2: Can't locate object method "getElementsByClassName" via package "HTML::TagParser::Element" by Vonunov (Novice) on Nov 27, 2014 at 05:04 UTC
Cool, I see how that works now. Your suggestion does the trick. I think I almost had something working with HTML::TreeBuilder, but as this script is auxiliary and it's getting me what I need, I think I'll leave well enough alone.	[reply]
Re: Can't locate object method "getElementsByClassName" via package "HTML::TagParser::Element" by Anonymous Monk on Nov 26, 2014 at 19:39 UTC
... or is it something else? Its a bug/limitation/oversight of HTML::TagParser ... its why HTML::TreeBuilder is more popular , sure it doesn't have the DOMish names, but its got HTML::TreeBuilder::XPath and htmltreexpather.pl :) HTML::TagParser is depended on by 7 distributions HTML::TreeBuilder is depended on by 180 distributions Also, you posted too much stuff, narrow it down (aka clean your room ) `#!/usr/bin/perl -- use strict; use warnings; use HTML::TagParser; my $html = HTML::TagParser->new(q{ <div id="bug"> <span class="type"> yo </span> <span class="type"> ho </span> <span class="type"> ho </span> </div> }); my $bug = $html->getElementById("bug"); my @sss = $bug->getElementsByClassName("type"); __END__ Can't locate object method "getElementsByClassName" via package "HTML: +:TagParser::Element" at - line 13.` [download]	[reply] [d/l]
Re^2: Can't locate object method "getElementsByClassName" via package "HTML::TagParser::Element" by Vonunov (Novice) on Nov 26, 2014 at 19:49 UTC
I worried that if I tried to trim anything I would cut out the part that magically makes my problem happen even though it looks like it has nothing to do with it. :P Thanks, I'll try that module then.	[reply]
Re^3: Can't locate object method "getElementsByClassName" via package "HTML::TagParser::Element" by Laurent_R (Canon) on Nov 26, 2014 at 20:04 UTC
Making the effort of trimming your code to a more reasonable length and still exhibit the problem might actually help you figuring out by yourself where the error is. And it will make it easier for us to help you if you did not solve the problem yourself in the course of that process.	[reply]
Re^3: Can't locate object method "getElementsByClassName" via package "HTML::TagParser::Element" by Anonymous Monk on Nov 27, 2014 at 00:15 UTC
I worried that if I tried to trim anything I would cut out the part that magically makes my problem happen even though it looks like it has nothing to do with it. :P Thats why you make a copy of the file each time you trim, that way, if the problem disappears, you can go back a file ;)	[reply]