in reply to Screen scraping complex tables and divs

I'm confused because the thread you linked to is already very good.

You mostly use

in live inspections (ie when you need browser for JS) and as far as I remember did WWW::Mechanize::Firefox and its various siblings support both.

The alternative is mirroring the DOM into a Perl/XML data structure and using the query API. (Mostly like xpath)

Maybe you should ask more precisely and show what you tried?

update

> 1) Find the 6th and 9th rows in a named table (given an id) and pull out the name and value pairs.

> 2) Slurp in every row in a named table and parse out the name value pairs.

See

and alternatively Both methods support querying children elements of a given ID.

Query syntax is not a Perl question, but there are plenty of good tutorials online.

Look out for browser features/addons allowing to play around with queries.

Cheers Rolf
(addicted to the Perl Programming Language and ☆☆☆☆ :)
Je suis Charlie!

Replies are listed 'Best First'.
Re^2: Screen scraping complex tables and divs (updated)
by parser (Acolyte) on Oct 13, 2017 at 21:42 UTC
    Rolf,

    I am confused now too. Are you saying WWW::Mechanize supports CSS selector and XPath? Or that WWW::Mechanize::Firefox does? If the latter, I also read it was very difficult to build.

    Query syntax is not a Perl question, but there are plenty of good tutorials online.

    I agree. However, determining how best to query HTML source via Perl is.

    The option of mirroring the DOM into a Perl/XML data structure and using the query API sounds quite good. I'll give that a go and see how it works. Anything is better than parsing table tags with TokParser.
      WWW::Mechanize::Firefox does and I took it as an example out of many because I worked with it in the past.

      But it really depends if you need JS or not, so I don't want to go into details.

      Querying Html was your question, something like xpath or css selector is mostly the solution.

      Regarding the Perl backend: it depends.

      Sorry there is no generic answer for TIMTOWTDI .

      Cheers Rolf
      (addicted to the Perl Programming Language and ☆☆☆☆ :)
      Je suis Charlie!

      PS:

      > > Look out for browser features/addons allowing to play around with queries.

      I had very good experience using Firepath to find the right CSS selectors / XPath expressions inside Firefox.

      You can copy an auto-generated explicit expression by right clicking on a DOM-element and change them interactively.

      Simply copy the final path and/or selector into your Perl code then.

      HTH! :)

      Cheers Rolf
      (addicted to the Perl Programming Language and ☆☆☆☆ :)
      Je suis Charlie!

        Good catch! Firepath is saving me much time!