AI Cowboy has asked for the wisdom of the Perl Monks concerning the following question:
I'm having trouble with using Perl to parse an HTML file I have, where I'm trying to grab all <a> and <div> tags if the link or text content matches a certain format (I use a regex for this). However, WWW::Mechanize can only find links (<a> tags), not <div> tags, so that doesn't work. I've tried learning HTML::TreeBuilder but it seems that my brain doesn't understand the documentation very well for some reason.
I'm wondering if you chaps can either direct me to a better, cleaner Perl module that can extract all tags and let me analyze their attributes/text, or help me with my problem with HTML::TreeBuilder?
My problem is that with, for example, http://search.cpan.org/~cjm/HTML-Tree-5.03/lib/HTML/Element.pm#find_by_tag_name, I have no idea what $h is, or where it's coming from. It seems - to me - the documentation for TreeBuilder and Element use variables without explaining what they are explicitly, and this hurts my brain. Some help would be wonderful, as I need to finish this project by the end of the week for my job, and I'm not sure what to do or why I'm not understanding this.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Perl HTML confusion...
by trippledubs (Deacon) on Sep 17, 2013 at 20:30 UTC | |
by AI Cowboy (Beadle) on Sep 17, 2013 at 22:29 UTC | |
Re: Perl HTML confusion...
by marinersk (Priest) on Sep 17, 2013 at 17:59 UTC | |
by AI Cowboy (Beadle) on Sep 17, 2013 at 18:06 UTC | |
by marinersk (Priest) on Sep 17, 2013 at 18:26 UTC | |
by Anonymous Monk on Sep 18, 2013 at 02:49 UTC | |
Re: Perl HTML confusion...
by Happy-the-monk (Canon) on Sep 17, 2013 at 18:05 UTC | |
by AI Cowboy (Beadle) on Sep 17, 2013 at 18:08 UTC | |
by marinersk (Priest) on Sep 17, 2013 at 18:34 UTC | |
by Happy-the-monk (Canon) on Sep 17, 2013 at 18:46 UTC | |
by marinersk (Priest) on Sep 17, 2013 at 18:51 UTC | |
Re: Perl HTML confusion...
by Anonymous Monk on Sep 18, 2013 at 03:11 UTC |