Is there a way to get my own copy of Perl Monks for offline enjoyment and data hoarding?

I know I can mirror or recursively fetch with wget or curl, but I'd like to do this in a less impactful and most efficient way if at all possible.

Also, if there's a way to (r)sync said export periodically that'd be extra snazzy.

Scope: let's just say any page/asset that's viewable to me, anon.

Format: Prefer HTML I suppose, of top-level nodes. but I'd be happy to get my hands on whatever's available.

  • Comment on Is there a simple way to archive/download all of PerlMonks?

Replies are listed 'Best First'.
Re: Is there a simple way to archive/download all of PerlMonks?
by jdporter (Paladin) on Apr 30, 2024 at 22:28 UTC
Re: Is there a simple way to archive/download all of PerlMonks?
by LanX (Saint) on Apr 28, 2024 at 07:27 UTC
Re: Is there a simple way to archive/download all of PerlMonks?
by Anonymous Monk on Apr 28, 2024 at 22:07 UTC

    Probably better to (cr)use the Wayback Machine for this purpose. They already have an archived copy, after all.

      "They already have an archived copy, after all."

      They don't. I've checked several thread URLs, most didn't exist, those that did had snapshots that were years out of date. Besides, it doesn't match the criteria of the question, an 'offline' copy.

        > thread URLs, most didn't exist, those that did had snapshots that were years out of date.

        Did you check all 6+ domains? :)

        I'm wondering how likely an old thread can be out of date, do you expect many monks updating what they wrote at 9/11?

        Edit (answering myself)

        Hmmm wait, besides editing we have indeed necroposts resurrecting old threads.

        True, a backup service would need to check RAT or newest nodes regularly. (Or refrain to mirror only single posts)

        And editing isn't recorded anywhere... :/

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        see Wikisyntax for the Monastery

Re: Is there a simple way to archive/download all of PerlMonks?
by afoken (Chancellor) on Apr 30, 2024 at 14:07 UTC

    Wikipedia once had a way to download a big tar(?) archive of their articles, IIRC in wiki syntax. I don't know if this function is still active.

    Having a cron job export the source of all nodes readable to Anonymous Monk every day, week or month to an archive might be an idea. That joub could perhaps run on a dedicated server, perhaps on a snapshot of the live database, so it would not put extra load on the normal servers.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: Is there a simple way to archive/download all of PerlMonks?
by Anonymous Monk on Apr 28, 2024 at 00:28 UTC

      On the other hand, that limitation is there for a reason. If someone wants to make a useful copy, they should exclude those nodes not accessible to AnonyMonk.

Re: Is there a simple way to archive/download all of PerlMonks?
by nikosv (Deacon) on Apr 28, 2024 at 09:10 UTC
    Actually that could be useful in training/fine tuning a local LLM
    on the collective Perlmonks threads/data so you can ask free style ChatGPT alike questions on it.

      This will probably be illegal in future, cruelty to AI.

        I think otherwise: compared to SO, it would be a merciful treatment;-)
      I'm very far from being a "prompt engineer", but I suppose it should be already possible to tell AI-search only to consider perlmonks.

      The harder part is ignore everything from certain "BS monks" ;)

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      see Wikisyntax for the Monastery

        This is only if the perlmonks text was swept up in the training data. It's probably in our best interest to make perlmonks more downloadable so that this body of information is available to LLM tools. People might actually decide to use or not use perl for a task based on how well ChatGPT can answer questions about it. Lately I've been asking it a bunch of questions about Vue3 and amazes how useful the answers are (as a search engine, it still doesn't write accurate code).
Re: Is there a simple way to archive/download all of PerlMonks?
by harangzsolt33 (Deacon) on Apr 27, 2024 at 16:46 UTC
    Now, this is a great question!!! I would love to download it or get it somehow if it was possible. Even pure text format would be fine. It doesn't have to be in HTML, although that would be nicer.

    I think, it's just a matter of time, and someone is going to say, "Why don't you just scrape it?" Lol :D