in reply to legality of extracting content from websites

It's probably not illegal, for that you'd have to get the advice of a lawyer, but I am certain that it is against their Terms of Service. For example:

Yahoo! grants you a personal, non-transferable and non-exclusive right and license to use the object code of its Software on a single computer; provided that you do not (and do not allow any third party to) copy, modify, create a derivative work of, reverse engineer, reverse assemble or otherwise attempt to discover any source code, sell, assign, sublicense, grant a security interest in or otherwise transfer any right in the Software. You agree not to modify the Software in any manner or form, or to use modified versions of the Software, including (without limitation) for the purpose of obtaining unauthorized access to the Service. You agree not to access the Service by any means other than through the interface that is provided by Yahoo! for use in accessing the Service.

Despite this, the German computer magazine c't just published a howto on retrieving your e-mail from webmail services using Perl and LWP. You might be able to glean some useful information from it (even if you don't speak German).

--
Allolex

  • Comment on Re: legality of extracting content from websites

Replies are listed 'Best First'.
Re: Re: legality of extracting content from websites
by dbp (Pilgrim) on Jul 15, 2003 at 10:16 UTC

    To me, the important line is: You agree not to access the Service by any means other than through the interface that is provided by Yahoo! for use in accessing the Service.

    Is writing a piece of software that scrapes the yahoo web page and provides you as the user with a different interface in violation of this? In a sense, you are still using their interface, you simply added a proxy. It isn't technically much different from a blind person using sofware that reads the contents of the web site to her.

      "Is writing a piece of software that scrapes the yahoo web page and provides you as the user with a different interface in violation of this?"

      Yes, that is the right line and the talk about reverse engineering applies as well. They clearly want their customers using only that interface which allows them to finance their service through advertising.

      "In a sense, you are still using their interface, you simply added a proxy. It isn't technically much different from a blind person using sofware that reads the contents of the web site to her."

      Except that the person involved is not blind and is trying to bypass Yahoo's interface :) I understand the point you are trying to make, but the TOS seem pretty clear to me. I'm sure they put that last sentence in to specifically address the issue of web scraping.

      --
      Allolex

        I agree that, as far as Yahoo is concerned, the TOS rules out web scraping. Nonetheless, I think it is very difficult to decide if scraping constitutes using an interface other than Yahoo's. You could argue that the TOS implied that the user wasn't allowed to delve into Yahoo's system thus bypassing their interface. It is moot really, since I doubt the OP has the resources to fight Yahoo on the semantics should he end up in civil court.
        So using a text mode browser is a violation of their terms of service?

        --

        flounder

Re: Re: legality of extracting content from websites
by sauoq (Abbot) on Jul 16, 2003 at 01:42 UTC
    I am certain that it is against their Terms of Service.

    I'm not so sure. I suppose it'll take a representative of the company to interpret what was really meant, but agreeing "not to access the service by any means other than through the interface that is provided" doesn't seem to me to be a promise to use an interactive browser.

    Afterall, they don't provide a browser, so that must not be what they mean by "the interface that is provided."

    I think the interface they provide is defined by their web servers, not by the various clients that may be used to access them.

    Besides, I really doubt that Yahoo cares whether you suck down their pages using Mozilla or something hacked up with perl and LWP. The fact is that they are not going to lose much revenue anyway. The number of people that do this sort of thing is relatively small; the lower page views probably don't translate to much lower click throughs; and your mail accounts with them have other value (e.g. you agree to receive their spam.)

    -sauoq
    "My two cents aren't worth a dime.";