Re: Screen scraping complex tables and divs (updated)

I'm confused because the thread you linked to is already very good.

You mostly use

CSS selector or
XPath

in live inspections (ie when you need browser for JS) and as far as I remember did WWW::Mechanize::Firefox and its various siblings support both.

The alternative is mirroring the DOM into a Perl/XML data structure and using the query API. (Mostly like xpath)

Maybe you should ask more precisely and show what you tried?

update

> 1) Find the 6th and 9th rows in a named table (given an id) and pull out the name and value pairs.

> 2) Slurp in every row in a named table and parse out the name value pairs.

See

$mech->xpath( $query, %options)

and alternatively

$mech->select( $name, $value )

Both methods support querying children elements of a given ID.

Query syntax is not a Perl question, but there are plenty of good tutorials online.

Look out for browser features/addons allowing to play around with queries.

Cheers Rolf
_{(addicted to the Perl Programming Language and ☆☆☆☆ :)

Je suis Charlie!}

Comment on Re: Screen scraping complex tables and divs (updated) Select or Download Code

Replies are listed 'Best First'.
Re^2: Screen scraping complex tables and divs (updated) by parser (Acolyte) on Oct 13, 2017 at 21:42 UTC
Rolf, I am confused now too. Are you saying WWW::Mechanize supports CSS selector and XPath? Or that WWW::Mechanize::Firefox does? If the latter, I also read it was very difficult to build. Query syntax is not a Perl question, but there are plenty of good tutorials online. I agree. However, determining how best to query HTML source via Perl is. The option of mirroring the DOM into a Perl/XML data structure and using the query API sounds quite good. I'll give that a go and see how it works. Anything is better than parsing table tags with TokParser.	[reply]
Re^3: Screen scraping complex tables and divs (updated) by Corion (Patriarch) on Oct 14, 2017 at 06:44 UTC
As a very simplicistic application of combining WWW::Mechanize with HTML::TreeBuilder and HTML::Selector::XPath, I wrote App::scrape. This module encapsulates extracting data from HTML either via CSS selectors or XPath queries. Maybe you can use that as a starting point.	[reply]
Re^3: Screen scraping complex tables and divs by LanX (Saint) on Oct 13, 2017 at 21:53 UTC
WWW::Mechanize::Firefox does and I took it as an example out of many because I worked with it in the past. But it really depends if you need JS or not, so I don't want to go into details. Querying Html was your question, something like xpath or css selector is mostly the solution. Regarding the Perl backend: it depends. Sorry there is no generic answer for TIMTOWTDI . Cheers Rolf _{(addicted to the Perl Programming Language and ☆☆☆☆ :) Je suis Charlie!}	[reply]
Re^3: Screen scraping complex tables and divs (Firepath) by LanX (Saint) on Oct 14, 2017 at 22:43 UTC
PS: > > Look out for browser features/addons allowing to play around with queries. I had very good experience using Firepath to find the right CSS selectors / XPath expressions inside Firefox. You can copy an auto-generated explicit expression by right clicking on a DOM-element and change them interactively. Simply copy the final path and/or selector into your Perl code then. HTH! :) Cheers Rolf _{(addicted to the Perl Programming Language and ☆☆☆☆ :) Je suis Charlie!}	[reply]
Re^4: Screen scraping complex tables and divs (Firepath) by parser (Acolyte) on Oct 17, 2017 at 16:45 UTC
Good catch! Firepath is saving me much time!	[reply]