comment on

Ok, so based on your replies above, given:

<HTML>
<title>My Page</title>
</head>
<body>
<center>
<h1>Brand.com Production Instances</h1>
<br>
<table border=1>

<tr><td></td><td><b>&nbsp;Service &nbsp;&nbsp;</b></td><td><b>Instance
+&nbsp;</b></td>
<tr><td align="right">1</td><td>&nbsp;app2<br></td><td>&nbsp;prd-1</td
+><td>
</td>
</tr>
<tr><td align="right">2</td><td>&nbsp;app2 &nbsp;<br></td><td>&nbsp;pr
+d-2</td><td>
</td></tr>
<tr><td align="right">3</td><td>&nbsp;app3<br></td><td>&nbsp;prd-1</td
+><td>
[download]

etc etc

you want to print out the text in the <td> tags that have align="right" as an attribute.

This code will do that:

#!/usr/bin/perl
use strict;
use warnings;
use LWP::UserAgent;
use HTML::Parser;

# Create instance
my $p = HTML::Parser->new(api_version => 3,
        marked_sections    => 1,
        unbroken_text      => 1,
        start_h => [\&start, "tagname, attr"],
        text_h => [\&text, 'text'],
);
# Start parsing the following HTML file
$p->parse_file("testpage.html");

my $get_next_text = 0;
sub start{
# Execute when start tag is encountered
    my ($tagname,$attr) = @_;
    if ($tagname eq 'td' && exists $attr->{align} && $attr->{align} eq
+ 'right'){
        $get_next_text = 1;
    }
    else {
        $get_next_text = 0;
    }
}
sub text {
    my $text = shift;
    print "$text\n" if $get_next_text;
}
[download]

What it does is this:

Set up HTML::Parser so that for each start tag &start gets called with as arguments the tag name ("td" or something else) followed by the attributes as a hash-ref) and that for all text parts &text gets called with the text as the argument.
Note that a start tag is ANY tag that doesn't begin with </ - so <p> is a start tag and <td> is a start tag, but </p> is not. A "text" part is anything that is not a tag.
Test in &start if the current tag is a <td> with an align="right" attribute. If yes: set $get_next_text to true. if no: set $get_next_text to false.
Test in &text if the previous tag was a <td align="right"> (via the $get_next_text variable). If yes, print, otherwise do nothing.

Hope this clears it up :-)

Joost.

"What should it profit a man, if he should win a flame war, yet lose his cool?"

In reply to Re: HTML Parser print text by Joost
in thread HTML Parser print text by Vanquish

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.