HTML::Parser example wanted...

RatArsed has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: HTML::Parser example wanted... by andreychek (Parson) on Jun 26, 2001 at 19:19 UTC
Actually, there are a bunch of examples that come with the HTML::Parser module, found in the "eg" directory. Taking the code from there, here is an example of how to parse all the text from an HTML document: `#!/usr/bin/perl -w # Extract all plain text from an HTML file use strict; use HTML::Parser 3.00 (); my %inside; sub tag { my($tag, $num) = @_; $inside{$tag} += $num; print " "; # not for all tags } sub text { return if $inside{script} \|\| $inside{style}; print $_[0]; } HTML::Parser->new(api_version => 3, handlers => [start => [\&tag, "tagname, '+1'"], end => [\&tag, "tagname, '-1'"], text => [\&text, "dtext"], ], marked_sections => 1, )->parse_file(shift) \|\| die "Can't open file: $!\n";;` [download] That code is located in eg/htext. After taking a look, you can see that it is event driven. The HTML::Parser->new line has an option in it called "handlers", which tells HTML::Parser which function to call upon seeing a certain tag type. In this case, every start tag calls the function "tag" with the parameters "tagname", which is the actual tagname, and +1, which identifies it as a start tag. Personally, I have had more luck with HTML::TokeParser, but that isn't the case for everyone I'm sure. I find that HTML::TokeParser is a bit more intuitive for this sort of job, but that is perhaps just the way I think.. or maybe I just wasn't using it right ;-) In any case, good luck. -Eric	[reply] [d/l]
Re: HTML::Parser example wanted... by LD2 (Curate) on Jun 26, 2001 at 19:17 UTC
Here are a few links: Using Super Search: Who has used HTML::Parser? Using HTML::Parser - a quick guide Parsing HTML with HTML::Parser Remember - Google is your friend!	[reply]
Re: HTML::Parser example wanted... by larsen (Parson) on Jun 26, 2001 at 20:03 UTC
I wrote some code that's around the Monastery: Did someone say robot? Don't panic! (Hitch hikers' facility) Statistician in my garbage... (but pay attention to merlyn's reply, cause I wrote this program in an old-fashioned way)	[reply]
Re: HTML::Parser example wanted... by princepawn (Parson) on Jun 26, 2001 at 21:15 UTC
I found HTML::TokeParser (part of the HTML::Parser distribution) to be easier to use but in this case I used HTML::TreeBuilder. This example reads a 2x2 table. #!/usr/local/bin/perl use Data::Dumper; use HTML::TreeBuilder; use strict; die "must input filename" unless @ARGV; foreach my $file_name (@ARGV) { my $tree = HTML::TreeBuilder->new; # empty tree $tree->parse_file($file_name); print "Hey, here's a dump of the parse tree of $file_name:\n"; # $tree->dump; # a method we inherit from HTML::Element # Now that we're done with it, we must destroy it. my %table; ( $table{root}, $table{cond}, $table{'cond-alternatives'}, $table{action}, $table{'action-entries'} ) = $tree->find_by_tag_name('table'); my %td; map { $td{$_} = [ $table{$_}->find_by_tag_name('td') ] } (keys %tabl +e); my %x; map { my $field = $_; map { push @{$x{$field}}, $_->content_array_ref } @{$td{$_}} } (keys %td); printf "cond-alt has %s", Dumper $x{'cond-alternatives'}; $tree = $tree->delete; } [download]	[reply] [d/l]
Re: HTML::Parser example wanted... by Beatnik (Parson) on Jun 26, 2001 at 21:34 UTC
if davorg permits me to quote Data munging with Perl, chapter 9, page 165. `#!/usr/bin/perl -w use strict; use HTML::Parser; use LWP::Simple; sub start { my ($tag, $attr, $attrseq) = @_; print "Found $tag\n"; foreach(@$attrseq) { print " [$_ -> $attr->{$_}]\n"; } } my $h = HTML::Parser->new(start_h => [\&start,'tagname, attr, attrseq' +]); my $page = get(shift); $h->parse($page);` [download] Greetz Beatnik ... Quidquid perl dictum sit, altum viditur.	[reply] [d/l]
Re: HTML::Parser example wanted... by Graham (Deacon) on Jun 26, 2001 at 19:13 UTC
Try the documentation `perldoc HTML::Parser` [download] or the thread Who has used HTML::Parser?? for some samples of usage	[reply] [d/l]
Re: Re: HTML::Parser example wanted... by RatArsed (Monk) on Jun 26, 2001 at 19:18 UTC
I'd be after an example because the documentation isn't really clear that it can do what I want... Unfortunatly, the other node to which you refer in turn refers to version 2 of the parser (and I have 3, which, I believe, works diferently) and to TPJ, which is, er, closed... -- RatArsed	[reply]