in reply to Re^2: Parsing HTML
in thread Parsing HTML
It's a bit of a pain to figure out where to look, but the as_text method comes from HTML::Element. If you look at the docs, you'll see that in addition to as_text there is also a as_trimmed_text method. I looks like you could use it.
The secon foreach loop comes from looking at the HTML source for the page. The data you want is in the p with a class of itinerari-info, in consecutive span. Some of the span's can be discarded, the ones with classes of note and strike. That's what the XPath experssion returns. Each span includes a b element with the title, which I get in $info_title, display then detach to get it out of the way. The rest of the span is the information itself.
Does this help?
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^4: Parsing HTML
by marcoss (Novice) on Jun 13, 2012 at 08:22 UTC |