comment on

Hi-

I'm trying to parse this html and having no luck with LWP.

I just started learning it and i'm stuck on parsing this HTML code. (decided to do this for fun as a learning project)

<td><div class="banner"><span id="fulldescription" class="text11g"><b>
+Description:</b></span></div>
</td>
</tr>
</table>
<div align="center" style="padding-top:13">
<table width="98%" border="0" cellspacing="0" cellpadding="0">
<tr>
<td>
<div class="text12">

Season one of the ladder contest runs from July 1, 2004 to September 3
+0, 2004. During this time, all solo and 2v2 games on the Lordaeron, A
+zeroth, Kalimdor, and Northrend Gateways will be tracked by Blizzard.
+ The players with the most experience in ladder play will then be mat
+ched against each other in a series of tournaments to determine the u
+ltimate winner in both the Solo and 2v2 formats. 

</div>
</td>
[download]

it took me about 10 minutes to find a HTML coding like this from a gaming website.

basically i want my script to parse the description paragraph ("season one of the ladder contest...etc). i've been playing with the LWp code and am having no luck.

now a regular parsing code for the div class="text12" tag will make sense on retrieving the descriptiong. for example:

    if ($token->[0] eq 'S' and $token->[1] eq 'div' and
       ($token->[2]{'class'} || '') eq 'text12') {
                        print $stream->get_trimmed_text('/div');
                                                 }
[download]

But the thing is there are more then one "div class=text12" tags on the HTML page im retrieving with LWP. so I have to narrow my coding more so it parses inside that area only.

I tried this but no luck, any ideas?

while(my $token = $stream->get_token) {


    if ($token->[0] eq 'S' and $token->[1] eq 'span' and
       ($token->[2]{'id'} || '') eq 'fulldescription') {

        #found the <span class="fulldescription"> tag
    if ($token->[0][0] eq 'S' and $token->[0][1] eq 'div' and
       ($token->[0][2]{'class'} || '') eq 'text12') {
                        print $stream->get_trimmed_text('/div');
                                                 }
        

                                                 }

                                                        

}
[download]

the above code prints out nothing because i know i'm doing the coding wrong for this. I'm not sure on how to do the token sequence parsing so it narrows it down more.

Bobby

In reply to Parsing this HTML description by tanger

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.