I would use XPath or CSS expressions, and look at HTML::TreeBuilder::XPath to run the expressions against the HTML. Or rather, I would use App::scrape, which puts that approach into a module, or Web::Scraper and Web::Magic.
With XPath expressions, you can specify the elements you want like paths to files in a directory. In your case, it looks like the following XPath expressions would work:
# Each voyage //p[@class="itinerari-info"] # Itinerary within a voyage ./span[1] # Departure date ./span[2] # Ship ./span[3] ...
Depending on whether your target page only lists one such itinerary, you can roll the XPath expressions into one expression, instead of using them relative to the voyage nodes:
# Itinerary //p[@class="itinerari-info"]/span[1] ...
You can test out these queries in Firebug (I think), or with scrape-ff tool in WWW::Mechanize::Firefox, or with the scrape tool in App::scrape. Likely, Mojolicious and the modules mentioned before also contain tools for easy command line testing of XPath expressions against URLs.
In reply to Re: Parsing HTML
by Corion
in thread Parsing HTML
by marcoss
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |