in reply to Re^2: [OT] Ethical and Legal Screen Scraping
in thread [OT] Ethical and Legal Screen Scraping

Caching has been traditionally considered acceptible, legal, and ethical. Most web browsers do it on a small scale, and Google does it on a large scale. If the purpose of a scraper is to enable, for example, an "offline reader", there is no ethical dilemma.

Even long-term caches that are not shared with others are not an ethical dilemma so long as the material on the web site being cached remains publicly available. Beyond that point, there is a dilemma, and one has to consider whether the good of continued access to that data outweighs the good of complying with the copyright-holder's wishes. I'd take that on a case-by-case basis.

<-radiant.matrix->
Larry Wall is Yoda: there is no try{} (ok, except in Perl6; way to ruin a joke, Larry! ;P)
The Code that can be seen is not the true Code
"In any sufficiently large group of people, most are idiots" - Kaa's Law
  • Comment on Re^3: [OT] Ethical and Legal Screen Scraping

Replies are listed 'Best First'.
Re^4: [OT] Ethical and Legal Screen Scraping
by diotalevi (Canon) on Jul 25, 2005 at 22:09 UTC
    Magnetic or optical caches on disk are just a backup for my memory. It is not a lapse to remember information past the point that someone wishes you to remember their published information and is also not a lapse to remember this with the aid of a recording of it either.

    Just because I kept a copy of my hard drive, CD, printout, or in some hand written notes doesn't mean I'm obligated under any ethical system I'm aware of to destroy them just because my original source stopped publishing. The copyright holder's wishes are irrelevant.

      I happen to agree with your evaluation. However, if I try to evaluate things from a neutral ethical standpoint, I can see a potential dilemma in caching when material has been retracted from publication.

      However, I feel that the value of memory-augmentation, combined with the value of preserving information in case an author is pressured (legally or otherwise) into removing it from publication far outweighs any potential harm that might be percieved by copyright-holders.

      I also see it like ripping CDs I own to my harddisk - I am making an accessible copy of something I have a right to access. If my CD is destroyed, it is still ethically correct to keep my Vorbis files. Likewise, if content is destroyed from the 'net, I see no issue with maintaining the cache of it.

      <-radiant.matrix->
      Larry Wall is Yoda: there is no try{} (ok, except in Perl6; way to ruin a joke, Larry! ;P)
      The Code that can be seen is not the true Code
      "In any sufficiently large group of people, most are idiots" - Kaa's Law
Re^4: [OT] Ethical and Legal Screen Scraping
by Your Mother (Archbishop) on Jul 25, 2005 at 20:09 UTC

    Yep, I agree. Google in particular does exactly the right thing, I think. They allow you to control how they handle your content almost completely. You can exclude your pages from their caches while still including them in their search results.