Re: legality of extracting content from websites
by allolex (Curate) on Jul 15, 2003 at 09:30 UTC
|
It's probably not illegal, for that you'd have to get the advice of a lawyer, but I am certain that it is against their Terms of Service. For example:
Yahoo! grants you a personal, non-transferable and non-exclusive right and license to use the object code of its Software on a single computer; provided that you do not (and do not allow any third party to) copy, modify, create a derivative work of, reverse engineer, reverse assemble or otherwise attempt to discover any source code, sell, assign, sublicense, grant a security interest in or otherwise transfer any right in the Software. You agree not to modify the Software in any manner or form, or to use modified versions of the Software, including (without limitation) for the purpose of obtaining unauthorized access to the Service. You agree not to access the Service by any means other than through the interface that is provided by Yahoo! for use in accessing the Service.
Despite this, the German computer magazine c't just published a howto on retrieving your e-mail from webmail services using Perl and LWP. You might be able to glean some useful information from it (even if you don't speak German).
--
Allolex
| [reply] |
|
|
To me, the important line is: You agree not to access the Service by any means other than through the interface that is provided by Yahoo! for use in accessing the Service.
Is writing a piece of software that scrapes the yahoo web page and provides you as the user with a different interface in violation of this? In a sense, you are still using their interface, you simply added a proxy. It isn't technically much different from a blind person using sofware that reads the contents of the web site to her.
| [reply] |
|
|
"Is writing a piece of software that scrapes the yahoo web page and provides you as the user with a different interface in violation of this?"
Yes, that is the right line and the talk about reverse engineering applies as well. They clearly want their customers using only that interface which allows them to finance their service through advertising.
"In a sense, you are still using their interface, you simply added a proxy. It isn't technically much different from a blind person using sofware that reads the contents of the web site to her."
Except that the person involved is not blind and is trying to bypass Yahoo's interface :) I understand the point you are trying to make, but the TOS seem pretty clear to me. I'm sure they put that last sentence in to specifically address the issue of web scraping.
--
Allolex
| [reply] |
|
|
|
|
|
|
I am certain that it is against their Terms of Service.
I'm not so sure. I suppose it'll take a representative of the company to interpret what was really meant, but agreeing "not to access the service by any means other than through the interface that is provided" doesn't seem to me to be a promise to use an interactive browser.
Afterall, they don't provide a browser, so that must not be what they mean by "the interface that is provided."
I think the interface they provide is defined by their web servers, not by the various clients that may be used to access them.
Besides, I really doubt that Yahoo cares whether you suck down their pages using Mozilla or something hacked up with perl and LWP. The fact is that they are not going to lose much revenue anyway. The number of people that do this sort of thing is relatively small; the lower page views probably don't translate to much lower click throughs; and your mail accounts with them have other value (e.g. you agree to receive their spam.)
-sauoq
"My two cents aren't worth a dime.";
| [reply] |
Re: legality of extracting content from websites
by dreadpiratepeter (Priest) on Jul 15, 2003 at 13:25 UTC
|
This is an issue that really bothers me; the idea that theft is all right as long as you are stealing from a big company. And the rampant convoluted justifications that go along with it.
By using yahoo mail, you have entered into a contract with them. A contract that says what they will provide and what you will provide. Violating the terms of that contract and still using the product is theft- plain and simple, black and white.
If you don't want the ads, don't use their service. Pay for a service that makes its money from your contribution, not the contributions of advertisers.
The same applies to stealing cable, shoplifting, software piracy, hiding income from the government, etc. Theft is theft, no matter who you are stealing from.
The implication that I hear from some people that they are modern day Robin Hood's (although stealing from the rich and giving to yourself is suspect) bothers me.
The outrageous justifications that people use to pretend that it isn’t theft bother me. Particularly the justification that the big companies deserve to lose the money because of their bad business practices. That's crap.
If you have a problem with bad business practices then fix it the right way. Vote in every election from your local school board up to the Presidential Election and write to your elected official often, pointing out that you vote and that your vote is based on this issue (and any other issues that are important to you)1. Two wrongs don't cancel each other out. Especially when one of the wrongs is just a conscience-soothing justification for getting something for free.
BTW, I don't work for Yahoo or any other big company. I just feel strongly about the issue. Feel free to down vote me, but I'm tired of this behavior.
1"The condition upon which God hath given liberty to man is eternal vigilance; which condition if he break, servitude is at once the consequence of his crime and the punishment of his guilt." —John Philpot Curran
-pete
"Worry is like a rocking chair. It gives you something to do, but it doesn't get you anywhere." | [reply] |
|
|
I have to laugh out loud here... A guy styling himself as a pirate (dread pirate, no less) is lecturing that it's wrong to steal? LOL!
First off, scraping the e-mail without the ads is no more illegal than reading the e-mail without looking at the ads. So far as I know, no law (or portion of the TOS) requires that we actually read the ad.
And you suggestion that we change bad business practices by voting???!? Last time I checked, "president of Yahoo" wasn't an elected position. Politicians cannot stop businesses from having bad business practices. The only way to change bad business practices is to make the company lose money (which shouldn't be hard to do if the practices are really bad, right?)
Now as far as TOS goes, I can use a browser to go check my mail and scrape the message from the browser, and I haven't violated the TOS. I'm using their interface the way it was intended to be used. I'm just not physically at my computer when I do so.
Believe nothing, no matter where you read it, or who said it - even if
I have said it - unless it agrees with your own reason and your own
common sense.
-- Buddha
| [reply] |
Re: legality of extracting content from websites
by Anonymous Monk on Jul 15, 2003 at 09:08 UTC
|
- Ask a lawyer.
- I am not a lawyer, this is not legal advice, if you get sued into oblivion I am not responsible. I am not a lawyer...
That said:
Legally, there isnt anything wrong with creating a client that used the standard http protocal to retrieve and send information, is there?
First off, with the state of most judicial systems, it doesn't matter. You piss off a large company bad enough, chances are you're going to run into financial trouble, whether or not what you did was illegal (see any of about 50 000 recent cases for more details). If this bothers you I suggest you do something to improve it.
Secondly, you have to consider a few other interpretations of your actions. What's the difference between a valid request and a request that compromises a server? In many cases, nothing. Just because a service is made available by a company doesn't necessarily mean you can take advantage of it anyway you like.
Finally, consider your actions from the perspective of the target company. Is it profitable to sue you for grabbing your email via a method that subverts their advertising? Not unless they want to make an example out of you to curb a wider trend. Making a tool that automates it publically available is another issue altogether as it provides a very large target for a company that doesn't approve.
So to sum up (remembering I'm not a lawyer), it doesn't matter if it's illegal with the current state of affairs, even if you're right you could still get burned. Proceed at your own risk.
-- Some guy who isn't a lawyer.
| [reply] |
Re: legality of extracting content from websites
by derby (Abbot) on Jul 15, 2003 at 12:35 UTC
|
Hmmm ... there really is no difference between:
- A scraper that retrieves the whole page and just displays what you want
- A browser that blocks ads/graphics
- A person who has trained themselves to not view ads
I think this is part of the risk associated with any advertising - wether
online or off. You think those little cards fall out of magazines is just
accidental? Those cards fall out because it
forces you to pick it up and look at it. Bingo, the advertise now has your
attention.
Advertisers are well aware of the risks associated with online and offline
campaigns. They're aware that not everyone who is delivered an ad is
actually going to see an ad. Sure, they're not happy about it (ala Tivo) but
they're going to have to change.
So my $0.02USD is go ahead and scrape ... it's not copy-righted material, it's
your email (right, it is yours right). Let Yahoo and it's advertisers fight over
the biz implications of a non-effective campaign. At worst, all they can do really is deny you service. Big deal, get another one.
Or just bypass those services all together and buy yourself a web-hosting package. Those can be had for as little as $50USD a year and include email addresses. I use one of those so I can more easily switch ISPs (no need
to worry about lock-in due to ISP controlled addresses).
-derby | [reply] |
|
|
Well, the email is copyrighted, but neither Yahoo, nor the
receiver of the email has the copyright - that right belongs
to the sender. So, you can't use copyright as an
argument to bypass Yahoo's or Hotmails interface.
I'm not sure whether perlmonks is the appropriate place to
seek legal advice. This is a Perl site, mostly populated by
people who only know how they would like the law to be, and
not by people with a legal training. If you try to do something
of which you suspect you might break a law, consult a lawyer.
His/her professional advice is far more useful than anything
you find here.
Having said that, IMO, 'screen scraping' falls in the same
category as 'framing someone elses pages'. And I know lawsuits
where people who framed other peoples pages lost.
What I don't understand is why someone who's technically
capable to screen scrape uses yahoo and hotmail addresses.
Abigail
| [reply] |
Re: legality of extracting content from websites
by perrin (Chancellor) on Jul 15, 2003 at 14:09 UTC
|
There are already several programs out there that retrieve the body of the e-mail (and some can reply, etc.), and they don't appear to have been sued out of existence. Looks like you have your answer. | [reply] |
|
|
Just because they have not been sued yet, doesn't make the programs illegal. To draw an analogy: If I don't get caught knocking over the liqour store does that make it legal? No.
| [reply] |
|
|
That's quite a stretch ... going from web scraping email
contents to holding up cash registers. How about the
analogy of not coming to a complete stop at a stop sign
instead? Regardless of how you like your analogies, if the
scraper uses the public interface provided by said provider,
then how can it be illegal? Whether you use that interface
through a browser or a web bot shouldn't matter as long as
that interface is publically available. I can use the front
door to a public library if the library is open. Can i not
also teleport into the library lobby as long as it is open
to the public? What? You can't teleport? ;)
jeffa
*BAMF*
| [reply] |
|
|
|
|
Perhaps you missed the part of the question that asked if Yahoo would get upset. They do not seem to have gotten upset, and there are many of these out there on freshmeat.net. For the truly curious, a few e-mails to the authors of these tools would probably be a good next step, asking if they have encountered any problems.
| [reply] |
|
|
Perhaps we're confusing the terms "illegal" and "according to the terms of service", which correspond to "criminal" (penal code) and "civil" law. If you hold up a store, it is illegal because there are laws against armed robbery---a criminal case. If you break the terms of a contract, it is not illegal (eq not a criminal case). Your contract partner can take you to court to enforce the terms of the contract, but it is a civil case, not a criminal one, hence not illegal.
--
Allolex
| [reply] |
|
|
|
|
|
Re: legality of extracting content from websites
by chunlou (Curate) on Jul 16, 2003 at 01:53 UTC
|
You can ask Yahoo or Hotmail customer support directly, though their message pages are not obvious to find.
Besides legality, you should also been concerned whether your client application could be mistaken as "robot" or "worm" by the sites and block your IPs accordingly. They did that for security reason.
| [reply] |
Re: legality of extracting content from websites
by PodMaster (Abbot) on Jul 16, 2003 at 07:31 UTC
|
I agree with what
derby and sauoq say (274380 and 274659 respectively).
Besides tagging each email yahoo sends, yaho could also tag every email it receives, and I doubt anyone could do anything about it without paying for the premium yahoo mail service.
Now Yahoo can go and amend its TOS to say that you may only access yahoo mail/whatever using a web browser like mozilla or internet explorer ... or anything else, anyway, http://yahoopops.sourceforge.net/.
| MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!" | | I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README). | | ** The third rule of perl club is a statement of fact: pod is sexy. |
| [reply] |