It's hard to say. Did you ask the owners from the site?
You shouldn't also forget that nowadays almost all sites
contain advertisement, which is either the main reason for
the site, or to help the site sustain itself.
Sure, your 12 hits a day won't have much impact. But what
if it becomes common place? What if the majority of the
people here used LWP to access perlmonks, decimating the
ad hits?
I've written my share of LWP scripts, and I've written ad
busting proxies. But I'm not convinced everything I did was
ethical. I'd say it depends on the site, and the views of the
owners of the site.
I don't think it's wrong for your script to pretend it's
Netscape, but I do think it's wrong to ignore a robots.txt
file. If the site would have a policy against automated
harvesting, bypassing the policy would certainly be "wrong".
Abigail | [reply] |
If you have a valid business requirement for this data, this is (IMO) a dirty method of obtaining it.
In an ideal world, you'd ask the developers of the website so make available to you the source of the data so you can get it yourself.
If this is not viable, then i see no major problem with it. Infact the frequency of your "get" is entirely dependant on your requirement.
Again, If its a valid business requirement, why not get it from the source?
Update: My reply is from the POV of you being internal to the corporate site. | [reply] |
On the rare occasions I've done something like this I've wrestled with my conscience a bit and then salved it by making my script strip out the addresses behind any banners on the site, and then use LWP to get the data from these addresses. Obviously, although this is nice for the site owner I'm now ripping off the advertisers, who think my LWP clickthrough is something to celebrate, and I'm not sure this is any less reprehensible; but it pushes the reprehensibility a little further away. Perhaps what you really need to do is write a script that clicks through the advertisers and then randomly buys stuff from them. But that way madness lies.
§ George Sherston | [reply] |
I think the key here is the use to which you put the extracted information. If it's for personal use then I wouldn't worry too much about it.
I wrestled a bit with a similar question -- I was distributing some modules that yanked information off of sites (historical stock quotes, to be precise). After some constructive conversations with pjf, I came to realize that scripts such as these are nothing more than a browser. The terms of service for a site apply to the user of the browser, not the author. So in this sense I passed the buck -- here's a tool, read the TOS of each site involved and see if it applies to *you* -- the TOS for the site does not apply to the tool in hand.
After all, what if you use mozilla and banish images from certain advert servers? It's not the authors of the browser's fault -- they merely provide a useful tool. The TOS of the site applies to the user of the browser.
As the user of your tool, you will have to examine how you are using the data you are fetching. If you're repackaging it or selling it as is, that's a problem ethically as well financially, potentially, if the information source comes after you. If, on the other hand, you are selling analytical work derived from the data, well, that's not so cut and dried since you're adding value -- as several people have pointed out, you should cut through the middle man and buy the information directly. Do this not to merely salve your conscience but to protect your legal liability.
But if it's for personal use then I think you're just fine and I wouldn't worry about it. You're using a modified browser, end of story.
Matt | [reply] |
Like many are saying, I definitely believe it depends on the site you are retrieving the data from. If they don't have advertisements on their site, you hitting it every few hours with LWP is less stressful on their server then somebody hitting it more frequently with interactive browser. Even better, you could set up your script to only work at night, when few people would be there.
However, many sites have policies on this, which can often be found at the bottom of their site. For example, WhoWhere.com states the following in their terms of service:
(You agree not to) Sell, distribute, or make any commercial use of data obtained from any Lycos database or make any other use of data from any Lycos database in a manner which could be expected to offend the person for whom the data is relevant
-and-
Use automated means, including spiders, robots, crawlers, or the like to download data from any Lycos Network database.
Also, the terms of service for people.yahoo.com states:
You agree not to reproduce, duplicate, copy, sell, resell or exploit for any commercial purposes, any portion of the Service, use of the Service, or access to the Service.
The above statements make it sound like retrieving any data from either of those sites for any commercial purpose may be breaking their terms of service. So, I'd just make sure you read the terms of service and such for the site you're looking into. You may want to email them, and explicitly ask their permission -- they may let you do it, particularly if you tell them it'd only be once an hour throughout the night.
Good luck!
-Eric
--
Lucy: "What happens if you practice the piano for 20 years and then end up not being rich and famous?"
Schroeder: "The joy is in the playing." | [reply] |