trample666 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Im a newbie...

plzz forgive my ignorance..

I would like to know the advantages and disadvantages between web scrape using API and the regular web scrape...

  • Comment on web scrape using API and the regular web scrape

Replies are listed 'Best First'.
Re: web scrape using API and the regular web scrape
by Corion (Patriarch) on Mar 02, 2010 at 13:33 UTC

    If the service you want to scrape provides an API, using that API has the benefit of giving you clean data and shielding you from a redesign of the website or a change in language or layout.

    Using an API often also means that you need to register with the website and also often means that there are limits as to how often you may use the service.

Re: web scrape using API and the regular web scrape
by derby (Abbot) on Mar 02, 2010 at 13:42 UTC

    ++Corion's answer but I'm not sure that's what the OP was talking about. If you're using the API of a website, then I would not call it a 'scrape.' trample666, where you asking about a site's API or using a module such as WWW::Mechanize? The way you phrased the question leaves a lot open to interpretation

    -derby

      Difference between scraping a normal website using, for example API's provided by google and scraping a website without using API's (maybe using mech, post or get like u said).

      Also should the website that I am scraping be API enabled(I mean should it also provide API services) when I use API's to scrape???

        Can you translate that into english please?
Re: web scrape using API and the regular web scrape
by amir_e_a (Hermit) on Mar 02, 2010 at 16:44 UTC

    Writing a scraping program is harder in the first place. Besides, if you use the scraping method, then it may stop working when the design of the website changes, unless be pure luck it will keep working.

    So if you have the option to use an API, do use it, unless you actually see a problem with it.

    Sometimes the website doesn't provide an API, though, and then you don't have a choice.