shivanisai has asked for the wisdom of the Perl Monks concerning the following question:

Look at the following html source
<div><p><a href="http://www.somesite.com.br/site/lojavirtual/produtos. +asp?id=2507 "><img alt="ESPELHO RETROVISOR - S00224 - SAFETY" src="http://www.some +site.com.br /site/lojavirtual/produtos/2507/peq.jpg" /> </a></div>
If I write css selector for this html source as
$scraper2->select('div p a')->data;

We can extract the {href} value of tag. But I need a single CSS selector to extract both href value and <img> src value.How can we write the selector? or could you give any sites to refer to write the CSS selectors efficiently?

  • Comment on How to write CSS selector to extract more than one value from html source using scrappy module?
  • Select or Download Code

Replies are listed 'Best First'.
Re: How to write CSS selector to extract more than one value from html source using scrappy module?
by Corion (Patriarch) on May 16, 2011 at 12:03 UTC

    CSS selectors cannot extract attributes.

    You can try to extract the node and the child node in two passes. It seems that Scrappy uses Web::Scraper, so maybe learning about how to do things using Web::Scraper will help you.

    I would guess that the ->focus method will allow you to select a node and its child nodes, and then you can select the link together with the img tag.

Re: How to write CSS selector to extract more than one value from html source using scrappy module?
by Anonymous Monk on May 16, 2011 at 12:05 UTC
    But I need a single CSS selector to extract both href

    No, you absolutely do not need a single CSS selector

      Based on the Scrappy synopsis you might use
      $scraper->crawl( 'http://www.example.com/page', '/page' => { 'div p a' => sub { print $_[1]->{href}, "\n"; }, 'div p img' => sub { print $_[1]->{src}, "\n"; } } );
      the selectors are made in turn, not that useful

      Scrappy::Scraper::Parser further convinces me Scrappy has too much Pee.

      Pure Web::Scraper looks simpler to manage