Re^3: Parsing HTML

It's a bit of a pain to figure out where to look, but the as_text method comes from HTML::Element. If you look at the docs, you'll see that in addition to as_text there is also a as_trimmed_text method. I looks like you could use it.

The secon foreach loop comes from looking at the HTML source for the page. The data you want is in the p with a class of itinerari-info, in consecutive span. Some of the span's can be discarded, the ones with classes of note and strike. That's what the XPath experssion returns. Each span includes a b element with the title, which I get in $info_title, display then detach to get it out of the way. The rest of the span is the information itself.

Does this help?

Comment on Re^3: Parsing HTML

Replies are listed 'Best First'.
Re^4: Parsing HTML by marcoss (Novice) on Jun 13, 2012 at 08:22 UTC
Ok, this clarifies a lot. The `as_trimmed_text` worked just fine. I tried commenting the `detach` line, and like you said, it'll print the title twice. But then, it seems like you have seen something I completely overlooked. The strike attribute is only for dates that have been removed, that's why I didn't see it before... but still when I execute the script, the date shows up. Is it a matter of using an `if` statement?... Because it looks to me that the `foreach my $info ( $trip->findnodes( './/p[@class="itinerari-info"]//span[@class != "note" and @class != "strike"]'))` should take care of it. mmmm I'm thinking of `unless` but those are only assumptions... I'll let you know if I fix this, even though probably...eventually, I'll be crying out for help xD. Anyway, thank very much for your time and your patience. cheers! marcos	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^4: Parsing HTML
by marcoss (Novice) on Jun 13, 2012 at 08:22 UTC

Ok, this clarifies a lot. The as_trimmed_text worked just fine. I tried commenting the detach line, and like you said, it'll print the title twice. But then, it seems like you have seen something I completely overlooked. The strike attribute is only for dates that have been removed, that's why I didn't see it before... but still when I execute the script, the date shows up. Is it a matter of using an if statement?... Because it looks to me that the foreach my $info ( $trip->findnodes( './/p[@class="itinerari-info"]//span[@class != "note" and @class != "strike"]')) should take care of it. mmmm I'm thinking of unless but those are only assumptions... I'll let you know if I fix this, even though probably...eventually, I'll be crying out for help xD. Anyway, thank very much for your time and your patience.

cheers!

marcos

[reply]
[d/l]
[select]