etc etc<HTML> <title>My Page</title> </head> <body> <center> <h1>Brand.com Production Instances</h1> <br> <table border=1> <tr><td></td><td><b> Service </b></td><td><b>Instance + </b></td> <tr><td align="right">1</td><td> app2<br></td><td> prd-1</td +><td> </td> </tr> <tr><td align="right">2</td><td> app2 <br></td><td> pr +d-2</td><td> </td></tr> <tr><td align="right">3</td><td> app3<br></td><td> prd-1</td +><td>
you want to print out the text in the <td> tags that have align="right" as an attribute.
This code will do that:
#!/usr/bin/perl use strict; use warnings; use LWP::UserAgent; use HTML::Parser; # Create instance my $p = HTML::Parser->new(api_version => 3, marked_sections => 1, unbroken_text => 1, start_h => [\&start, "tagname, attr"], text_h => [\&text, 'text'], ); # Start parsing the following HTML file $p->parse_file("testpage.html"); my $get_next_text = 0; sub start{ # Execute when start tag is encountered my ($tagname,$attr) = @_; if ($tagname eq 'td' && exists $attr->{align} && $attr->{align} eq + 'right'){ $get_next_text = 1; } else { $get_next_text = 0; } } sub text { my $text = shift; print "$text\n" if $get_next_text; }
What it does is this:
Note that a start tag is ANY tag that doesn't begin with </ - so <p> is a start tag and <td> is a start tag, but </p> is not. A "text" part is anything that is not a tag.
Joost.
In reply to Re: HTML Parser print text
by Joost
in thread HTML Parser print text
by Vanquish
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |