in reply to Re: Crawler in perl
in thread Crawler in perl

well I have already generated a crawler that works with three websites and that's obviously because I told the crawler to look for the products in the right place depending on the website (therefore it wont work with any other website), but i want to know is that is there a way to find the website's template by comparing the pages in a website ( for example finding reputations inside <td> tags therefore classifying them as the template and disregarding them ) that way no matter what website you give the crawler it can find the products and prices. I honestly dont know if it exists but people have been asking me if i can generate a crawler that works with most websites and i wonder how big websites such as kelko work?

Replies are listed 'Best First'.
Re^3: Crawler in perl
by Fletch (Bishop) on Apr 22, 2007 at 13:39 UTC

    Never heard of this "Kelko" thing before, but three clicks through their "About Us" page to their FAQ brings up the answer:

    Q: Does Kelkoo search all shops on the web?
    A: No, efficiently comparing prices from all shops on the web would be extremely difficult because there are far too many of them. Instead, we select a wide group of shops including big high street names and specialist internet shops. We are constantly looking for shops to add to our affiliate programme, and if we find a shop that has better offers than our current set, we contact them and try to include them on Kelkoo. If you can find a better price elsewhere, we'd love to hear it!

    So they're more than likely writing scrapers for the sites they're specifically interested in, or they're probably big enough (as part of Yahoo) to have worked out some sort of arrangement with the source site to provide raw data.

    Now there are approaches such as this Ruby work which provide a DSL (domain specific language) which lets you describe scrapers in DOM/CSS terms which make it easier to build up scrapers for new sites. I'm not aware of any Perl implementations of this idea, but that might steer you in the right direction.

      I'm not aware of any Perl implementations of this idea

      Web::Scraper - Web Scraping Toolkit inspired by Scrapi


      "Half of all adults in the United States say they have registered as an organ donor, although only some have purchased a motorcycle to show that they're really serious about it."
Re^3: Crawler in perl
by derby (Abbot) on Apr 22, 2007 at 12:00 UTC

    No ... not at this moment. Maybe if the Semantic Web ever takes off but I'm not holding my breath on that happening.

    -derby