in reply to Framework for News Articles

Sounds like a cool project. One potential pitfall is legal. Some sites don't like robots gathering information from their pages automatically, because (1) ads are not seen and (2) automatic collection of info could be a violation of copyright, depending on the use it is put to.

The solution is to check terms of use on the website, or ask the webmaster.

-Mark

Replies are listed 'Best First'.
Re: Re: Framework for News Articles
by CountZero (Bishop) on Mar 24, 2004 at 22:42 UTC
    Then they should put a robot.txt file on their website and of course all well-behaved robots check that and apply it.

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

      While robots.txt is an established standard for regulating the behavior of robots, the non-existence of a robots.txt is not a license to violate copyright. Many sites would want robots from Google and other search engines to index their pages, but they don't want any random person scraping their content and putting it up on another site. You could make a case for downloading content for personal use, but there's definitey gray areas out there.

      The moral of the story is that the legality definitely depends on the use of downloaded information.

        I entirely agree.

        It raises some interesting questions though: what would be the legal status of data in a proxy-server or cache? Does it violate copyright?

        CountZero

        "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law