v@vonunov ~/perl$ perl -v
This is perl 5, version 16, subversion 3 (v5.16.3) built for amd64-freebsd-thread-multi

Hi all,

I don't really "know" perl, I just cobble things together by reading examples and documentation, promptly forgetting how and why I did everything unless I heavily commented it, so it could be that I'm only missing something very basic.

The error in the title, '"Can't locate object method "getElementsByClassName" via package "HTML::TagParser::Element"', appears only after making a couple of small changes to a previously working script. First, the working version with usage demo:

#!/usr/bin/perl -w use strict; use HTML::TagParser; use URI::Fetch; # Take list of URLs like # http://everything2.com/user/ameriwire/writeups # and extract specific writeup URLs: "(thing)" # (Have to manually add multiple pages of WUs) my $infile = $ARGV[0]; # Give URL list file in first arg my $outfile = $ARGV[1]; # Give output file in second arg my $outfh; open (my $infh, '<', $infile) or die "Could not open file '$infile' $! +"; while (my $line = <$infh>) { chomp ($line); my $class = "type"; # .type my $html = HTML::TagParser->new($line); #Fetch+parse HTML file my @elem = $html->getElementsByClassName($class); #Grab each instance +of .type into array # <span class="type">(<a # href="/user/ameriwire/writeups/W.+Mark+Felt">person</a>)</span> foreach (@elem) { # iterate through array my $child = $_->firstChild(); # = <a> under <span> my $ahref = $child->getAttribute("href"); # return value of attrib hre +f my $wup = "http://everything2.com" . $ahref . "\n"; # "writeup" print "http://everything2.com" . $ahref . "\n"; open ($outfh, '>>', $outfile) or die "Could not open file '$outfile' $ +!"; print $outfh $wup; # Text to file close $outfh; print "Wrote to " . $outfile . "\n"; } }
[v@vonunov ~/perl]$ cat infile.txt http://everything2.com/user/ameriwire/writeups [v@vonunov ~/perl]$ ./get-wus.pl infile.txt outfile.txt http://everything2.com/user/ameriwire/writeups/diverticulosis Wrote to outfile.txt http://everything2.com/user/ameriwire/writeups/W.+Mark+Felt Wrote to outfile.txt http://everything2.com/user/ameriwire/writeups/moral+law Wrote to outfile.txt http://everything2.com/user/ameriwire/writeups/altruism Wrote to outfile.txt [etc.] [v@vonunov ~/perl]$ head outfile.txt http://everything2.com/user/ameriwire/writeups/diverticulosis http://everything2.com/user/ameriwire/writeups/W.+Mark+Felt http://everything2.com/user/ameriwire/writeups/moral+law http://everything2.com/user/ameriwire/writeups/altruism

Now here's the script that fails. Notice the area around "my $html"; now we're getting #mainbody and *then* finding .type within it. This is to avoid taking in unwanted content that's kept in a sidebar area on the target site.

#!/usr/bin/perl -w use strict; use HTML::TagParser; use URI::Fetch; # Take list of URLs like # http://everything2.com/user/ameriwire/writeups # and extract specific writeup URLs: "(thing)" # (Have to manually add multiple pages of WUs) my $infile = $ARGV[0]; # Give URL list file in first arg my $outfile = $ARGV[1]; # Give output file in second arg my $outfh; open (my $infh, '<', $infile) or die "Could not open file '$infile' $! +"; while (my $line = <$infh>) { chomp ($line); my $class = "type"; # .type my $id = "mainbody"; # #mainbody my $html = HTML::TagParser->new($line); #Fetch+parse HTML file my $body = $html->getElementById($id); #^If we don't do this we get sidebar WUs too my @elem = $body->getElementsByClassName($class); #Grab each instance +of .type into array # <span class="type">(<a # href="/user/ameriwire/writeups/W.+Mark+Felt">person</a>)</span> foreach (@elem) { # iterate through array my $child = $_->firstChild(); # = <a> under <span> my $ahref = $child->getAttribute("href"); # return value of attrib hre +f my $wup = "http://everything2.com" . $ahref . "\n"; # "writeup" print "http://everything2.com" . $ahref . "\n"; open ($outfh, '>>', $outfile) or die "Could not open file '$outfile' $ +!"; print $outfh $wup; # Text to file close $outfh; print "Wrote to " . $outfile . "\n"; } }
[v@vonunov ~/perl]$ ./get-wus-bad.pl infile.txt outfile.txt Can't locate object method "getElementsByClassName" via package "HTML: +:TagParser::Element" at ./get-wus-bad.pl line 31, <$infh.

(It isn't printing the ">" after infh?)

After re-re-reading HTML::TagParser I'm still not sure why it's doing this. But see how it says: via package "HTML::TagParser::Element"? If you look at the documentation, getElementsByClassName is not inside "HTML::TagParser::Element SUBCLASS". I don't see anything before my use of getElementByClassName that would have put us in the context of that subclass, so to speak, if that is a thing that happens.

My use doesn't line up exactly with the documentation, I think. They do "$html = HTML::TagParser->new( $file );" then "@elem = $html->getElementsByClassName( $class );" -- which works for me too -- but is the problem that I'm trying to also put it through another of these functions in turn, or is it something else?

This script is only for generating a file to feed into another script, so I don't need to get it perfect, only working, but if you have any advice at all, it's welcome.

Thanks.

Edit:

With use diagnostics;

[v@vonunov ~/perl]$ ./get-wus-bad.pl infile.txt outfile.txt Can't locate object method "getElementsByClassName" via package "HTML::TagParser::Element" at ./get-wus-bad.pl line 32, <$infh +> line 1 (#1) (F) You called a method correctly, and it correctly indicated a pa +ckage functioning as a class, but that package doesn't define that parti +cular method, nor does any of its base classes. See perlobj. Uncaught exception from user code: Can't locate object method "getElementsByClassName" via packag +e "HTML::TagParser::Element" at ./get-wus-bad.pl line 32.

Still lost.


In reply to Can't locate object method "getElementsByClassName" via package "HTML::TagParser::Element" by Vonunov

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.