comment on

v@vonunov ~/perl$ perl -v
This is perl 5, version 16, subversion 3 (v5.16.3) built for amd64-freebsd-thread-multi

Hi all,

I don't really "know" perl, I just cobble things together by reading examples and documentation, promptly forgetting how and why I did everything unless I heavily commented it, so it could be that I'm only missing something very basic.

The error in the title, '"Can't locate object method "getElementsByClassName" via package "HTML::TagParser::Element"', appears only after making a couple of small changes to a previously working script. First, the working version with usage demo:

#!/usr/bin/perl -w
use strict; 
use HTML::TagParser;
use URI::Fetch;

# Take list of URLs like 
# http://everything2.com/user/ameriwire/writeups 
# and extract specific writeup URLs: "(thing)"
# (Have to manually add multiple pages of WUs)

my $infile = $ARGV[0]; 
# Give URL list file in first arg

my $outfile = $ARGV[1];
# Give output file in second arg

my $outfh;

open (my $infh, '<', $infile) or die "Could not open file '$infile' $!
+";

while (my $line = <$infh>) {

chomp ($line);

my $class = "type"; # .type
my $html = HTML::TagParser->new($line); #Fetch+parse HTML file
my @elem = $html->getElementsByClassName($class); #Grab each instance 
+of .type into array

# <span class="type">(<a
# href="/user/ameriwire/writeups/W.+Mark+Felt">person</a>)</span>

foreach (@elem) { # iterate through array
my $child = $_->firstChild(); # = <a> under <span>
my $ahref = $child->getAttribute("href"); # return value of attrib hre
+f
my $wup = "http://everything2.com" . $ahref . "\n"; # "writeup"
print "http://everything2.com" . $ahref . "\n";

open ($outfh, '>>', $outfile) or die "Could not open file '$outfile' $
+!";
print $outfh $wup; # Text to file
close $outfh;
print "Wrote to " . $outfile . "\n";

}
}
[download]

[v@vonunov ~/perl]$ cat infile.txt 
http://everything2.com/user/ameriwire/writeups

[v@vonunov ~/perl]$ ./get-wus.pl infile.txt outfile.txt
http://everything2.com/user/ameriwire/writeups/diverticulosis
Wrote to outfile.txt
http://everything2.com/user/ameriwire/writeups/W.+Mark+Felt
Wrote to outfile.txt
http://everything2.com/user/ameriwire/writeups/moral+law
Wrote to outfile.txt
http://everything2.com/user/ameriwire/writeups/altruism
Wrote to outfile.txt

[etc.]

[v@vonunov ~/perl]$ head outfile.txt 
http://everything2.com/user/ameriwire/writeups/diverticulosis
http://everything2.com/user/ameriwire/writeups/W.+Mark+Felt
http://everything2.com/user/ameriwire/writeups/moral+law
http://everything2.com/user/ameriwire/writeups/altruism
[download]

Now here's the script that fails. Notice the area around "my $html"; now we're getting #mainbody and *then* finding .type within it. This is to avoid taking in unwanted content that's kept in a sidebar area on the target site.

#!/usr/bin/perl -w
use strict; 
use HTML::TagParser;
use URI::Fetch;

# Take list of URLs like 
# http://everything2.com/user/ameriwire/writeups 
# and extract specific writeup URLs: "(thing)"
# (Have to manually add multiple pages of WUs)

my $infile = $ARGV[0]; 
# Give URL list file in first arg

my $outfile = $ARGV[1];
# Give output file in second arg

my $outfh;

open (my $infh, '<', $infile) or die "Could not open file '$infile' $!
+";

while (my $line = <$infh>) {

chomp ($line);

my $class = "type"; # .type
my $id = "mainbody"; # #mainbody
my $html = HTML::TagParser->new($line); #Fetch+parse HTML file
my $body = $html->getElementById($id); 
#^If we don't do this we get sidebar WUs too
my @elem = $body->getElementsByClassName($class); #Grab each instance 
+of .type into array

# <span class="type">(<a
# href="/user/ameriwire/writeups/W.+Mark+Felt">person</a>)</span>

foreach (@elem) { # iterate through array
my $child = $_->firstChild(); # = <a> under <span>
my $ahref = $child->getAttribute("href"); # return value of attrib hre
+f
my $wup = "http://everything2.com" . $ahref . "\n"; # "writeup"
print "http://everything2.com" . $ahref . "\n";

open ($outfh, '>>', $outfile) or die "Could not open file '$outfile' $
+!";
print $outfh $wup; # Text to file
close $outfh;
print "Wrote to " . $outfile . "\n";

}
}
[download]

[v@vonunov ~/perl]$ ./get-wus-bad.pl infile.txt outfile.txt
Can't locate object method "getElementsByClassName" via package "HTML:
+:TagParser::Element" at ./get-wus-bad.pl line 31, <$infh.
[download]

(It isn't printing the ">" after infh?)

After re-re-reading HTML::TagParser I'm still not sure why it's doing this. But see how it says: via package "HTML::TagParser::Element"? If you look at the documentation, getElementsByClassName is not inside "HTML::TagParser::Element SUBCLASS". I don't see anything before my use of getElementByClassName that would have put us in the context of that subclass, so to speak, if that is a thing that happens.

My use doesn't line up exactly with the documentation, I think. They do "$html = HTML::TagParser->new( $file );" then "@elem = $html->getElementsByClassName( $class );" -- which works for me too -- but is the problem that I'm trying to also put it through another of these functions in turn, or is it something else?

This script is only for generating a file to feed into another script, so I don't need to get it perfect, only working, but if you have any advice at all, it's welcome.

Thanks.

Edit:

With use diagnostics;

[v@vonunov ~/perl]$ ./get-wus-bad.pl infile.txt outfile.txt
Can't locate object method "getElementsByClassName" via package
        "HTML::TagParser::Element" at ./get-wus-bad.pl line 32, <$infh
+> line 1 (#1)
    (F) You called a method correctly, and it correctly indicated a pa
+ckage
    functioning as a class, but that package doesn't define that parti
+cular
    method, nor does any of its base classes.  See perlobj.
    
Uncaught exception from user code:
        Can't locate object method "getElementsByClassName" via packag
+e "HTML::TagParser::Element" at ./get-wus-bad.pl line 32.
[download]

Still lost.

In reply to Can't locate object method "getElementsByClassName" via package "HTML::TagParser::Element" by Vonunov

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.