in reply to CB history and Google's cache

For the record when this came up yesterday in the CB I /msg'ed blakem the owner of thepen. It seems somehow (not sure how exactly, but whatever) that the nodelets were turned on on thepen without blakem's knowledge.

Anyway, I asked him to turn it off, and he said that he has. How long it'll take before it actually affects google results is another question. Also, i dont know if this will totally resolve the problem afaik googlebots are allowed on the front page and as they log in as AM they get the CB nodelet automatically, so anything there will be indexed. I think however this is a lot less drammatic than what would have been available through thepens mirror.


---
demerphq

    First they ignore you, then they laugh at you, then they fight you, then you win.
    -- Gandhi

    Flux8


  • Comment on Re: CB history and Google's cache (fixed)

Replies are listed 'Best First'.
Re^2: CB history and Google's cache (fixed)
by Aristotle (Chancellor) on Sep 14, 2004 at 21:00 UTC

    Remember that Google will soon re-index the frontpage and forget the moment of CB conversation it had previously conserved (in favour of a new one…)

    Unless prohibited by robot rules, the GoogleBot will also crawl into the site up to a certain depth, despite the parametrized GETs.

    Makeshifts last the longest.

Re^2: CB history and Google's cache (fixed)
by tye (Sage) on Sep 14, 2004 at 21:32 UTC

    PM's /robots.txt says:

    # sorry, but misbehaved robots have ruined it for all of you.
    User-agent: *
    Disallow: /
    so google shouldn't be indexing PM directly at all. But I believe that google still does index PM to some extent, perhaps when prompted to by people giving specific URLs to less-obvious google tools (which could be construed as not being "robot" traffic) -- though I'm just guessing wildly here.

    Corion is working on building perlmonks.org/robot/... and then we'll tell robots (via robots.txt) to only scan that part of the site and we'll also detect common robots (via user agent string) and redirect them there as well.

    - tye