in reply to Re^2: Is there a simple way to archive/download all of PerlMonks?
in thread Is there a simple way to archive/download all of PerlMonks?

> thread URLs, most didn't exist, those that did had snapshots that were years out of date.

Did you check all 6+ domains? :)

I'm wondering how likely an old thread can be out of date, do you expect many monks updating what they wrote at 9/11?

Edit (answering myself)

Hmmm wait, besides editing we have indeed necroposts resurrecting old threads.

True, a backup service would need to check RAT or newest nodes regularly. (Or refrain to mirror only single posts)

And editing isn't recorded anywhere... :/

Cheers Rolf
(addicted to the Perl Programming Language :)
see Wikisyntax for the Monastery

  • Comment on Re^3: Is there a simple way to archive/download all of PerlMonks?

Replies are listed 'Best First'.
Re^4: Is there a simple way to archive/download all of PerlMonks?
by marto (Cardinal) on Apr 29, 2024 at 09:36 UTC

    I didn't check all of the domains, no. I think the only valid archive would be an up to date database extract of node content (and some of the other metadata), rather partial snapshots of page impressions from a moment in time.

    Update: I seem to recall different domains having different robots.txt rules to impact indexing.

      Otherwise I'd poll the XML per node.

      If edits are/were reflected in the timestamps of the http headers, this could also be quite efficient in fetching updates.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      see Wikisyntax for the Monastery