Vonunov has asked for the wisdom of the Perl Monks concerning the following question:
v@vonunov ~/perl$ perl -v
This is perl 5, version 16, subversion 3 (v5.16.3) built for amd64-freebsd-thread-multi
Hi all,
I don't really "know" perl, I just cobble things together by reading examples and documentation, promptly forgetting how and why I did everything unless I heavily commented it, so it could be that I'm only missing something very basic.
The error in the title, '"Can't locate object method "getElementsByClassName" via package "HTML::TagParser::Element"', appears only after making a couple of small changes to a previously working script. First, the working version with usage demo:
#!/usr/bin/perl -w use strict; use HTML::TagParser; use URI::Fetch; # Take list of URLs like # http://everything2.com/user/ameriwire/writeups # and extract specific writeup URLs: "(thing)" # (Have to manually add multiple pages of WUs) my $infile = $ARGV[0]; # Give URL list file in first arg my $outfile = $ARGV[1]; # Give output file in second arg my $outfh; open (my $infh, '<', $infile) or die "Could not open file '$infile' $! +"; while (my $line = <$infh>) { chomp ($line); my $class = "type"; # .type my $html = HTML::TagParser->new($line); #Fetch+parse HTML file my @elem = $html->getElementsByClassName($class); #Grab each instance +of .type into array # <span class="type">(<a # href="/user/ameriwire/writeups/W.+Mark+Felt">person</a>)</span> foreach (@elem) { # iterate through array my $child = $_->firstChild(); # = <a> under <span> my $ahref = $child->getAttribute("href"); # return value of attrib hre +f my $wup = "http://everything2.com" . $ahref . "\n"; # "writeup" print "http://everything2.com" . $ahref . "\n"; open ($outfh, '>>', $outfile) or die "Could not open file '$outfile' $ +!"; print $outfh $wup; # Text to file close $outfh; print "Wrote to " . $outfile . "\n"; } }
[v@vonunov ~/perl]$ cat infile.txt http://everything2.com/user/ameriwire/writeups [v@vonunov ~/perl]$ ./get-wus.pl infile.txt outfile.txt http://everything2.com/user/ameriwire/writeups/diverticulosis Wrote to outfile.txt http://everything2.com/user/ameriwire/writeups/W.+Mark+Felt Wrote to outfile.txt http://everything2.com/user/ameriwire/writeups/moral+law Wrote to outfile.txt http://everything2.com/user/ameriwire/writeups/altruism Wrote to outfile.txt [etc.] [v@vonunov ~/perl]$ head outfile.txt http://everything2.com/user/ameriwire/writeups/diverticulosis http://everything2.com/user/ameriwire/writeups/W.+Mark+Felt http://everything2.com/user/ameriwire/writeups/moral+law http://everything2.com/user/ameriwire/writeups/altruism
Now here's the script that fails. Notice the area around "my $html"; now we're getting #mainbody and *then* finding .type within it. This is to avoid taking in unwanted content that's kept in a sidebar area on the target site.
#!/usr/bin/perl -w use strict; use HTML::TagParser; use URI::Fetch; # Take list of URLs like # http://everything2.com/user/ameriwire/writeups # and extract specific writeup URLs: "(thing)" # (Have to manually add multiple pages of WUs) my $infile = $ARGV[0]; # Give URL list file in first arg my $outfile = $ARGV[1]; # Give output file in second arg my $outfh; open (my $infh, '<', $infile) or die "Could not open file '$infile' $! +"; while (my $line = <$infh>) { chomp ($line); my $class = "type"; # .type my $id = "mainbody"; # #mainbody my $html = HTML::TagParser->new($line); #Fetch+parse HTML file my $body = $html->getElementById($id); #^If we don't do this we get sidebar WUs too my @elem = $body->getElementsByClassName($class); #Grab each instance +of .type into array # <span class="type">(<a # href="/user/ameriwire/writeups/W.+Mark+Felt">person</a>)</span> foreach (@elem) { # iterate through array my $child = $_->firstChild(); # = <a> under <span> my $ahref = $child->getAttribute("href"); # return value of attrib hre +f my $wup = "http://everything2.com" . $ahref . "\n"; # "writeup" print "http://everything2.com" . $ahref . "\n"; open ($outfh, '>>', $outfile) or die "Could not open file '$outfile' $ +!"; print $outfh $wup; # Text to file close $outfh; print "Wrote to " . $outfile . "\n"; } }
[v@vonunov ~/perl]$ ./get-wus-bad.pl infile.txt outfile.txt Can't locate object method "getElementsByClassName" via package "HTML: +:TagParser::Element" at ./get-wus-bad.pl line 31, <$infh.
(It isn't printing the ">" after infh?)
After re-re-reading HTML::TagParser I'm still not sure why it's doing this. But see how it says: via package "HTML::TagParser::Element"? If you look at the documentation, getElementsByClassName is not inside "HTML::TagParser::Element SUBCLASS". I don't see anything before my use of getElementByClassName that would have put us in the context of that subclass, so to speak, if that is a thing that happens.
My use doesn't line up exactly with the documentation, I think. They do "$html = HTML::TagParser->new( $file );" then "@elem = $html->getElementsByClassName( $class );" -- which works for me too -- but is the problem that I'm trying to also put it through another of these functions in turn, or is it something else?
This script is only for generating a file to feed into another script, so I don't need to get it perfect, only working, but if you have any advice at all, it's welcome.
Thanks.
Edit:
With use diagnostics;
[v@vonunov ~/perl]$ ./get-wus-bad.pl infile.txt outfile.txt Can't locate object method "getElementsByClassName" via package "HTML::TagParser::Element" at ./get-wus-bad.pl line 32, <$infh +> line 1 (#1) (F) You called a method correctly, and it correctly indicated a pa +ckage functioning as a class, but that package doesn't define that parti +cular method, nor does any of its base classes. See perlobj. Uncaught exception from user code: Can't locate object method "getElementsByClassName" via packag +e "HTML::TagParser::Element" at ./get-wus-bad.pl line 32.
Still lost.
|
|---|