in reply to Re: [RFC] Discipulus's step by step tutorial on module creation with tests and git
in thread [RFC] Discipulus's step by step tutorial on module creation with tests and git

Wasn't there just a giant episode of blocking bots that were scraping perlmonks for AI? So that would probably block the web crawlers too. I wouldn't expect perlmonks to be indexed after that.
  • Comment on Re^2: [RFC] Discipulus's step by step tutorial on module creation with tests and git

Replies are listed 'Best First'.
Re^3: [RFC] Discipulus's step by step tutorial on module creation with tests and git
by LanX (Saint) on May 22, 2025 at 22:00 UTC
    Googlebot is obviously not blocked.

    Afaik does DDG depend on the results from Bingbot.

    If the monastery decides to discard DDG (and Bing), it should probably not direct anonymous monks there when they try to super search

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    see Wikisyntax for the Monastery

      This is definitely a concern.

      I tried the ddg query site:perlmonks.org "Key bindings in the Debugger" (as that node is, as I write this, a very recent top-level post) and it found nothing.
      That is as expected, if we've blocked the crawlers feeding the repo consulted by ddg.

      However — I got the same result from the same query on google. (And ditto bing.)

      Now, it could be that that node is simply too new, and hasn't been picked up by the crawlers yet.
      Am I doing something wrong?

      Today's latest and greatest software contains tomorrow's zero day exploits.
        I can find a 7 days old thread with Google

        https://www.google.com/search?q=%22Largest+integer+in+64-bit+perl+%22++site%3Awww.perlmonks.org

        FWIW: It doesn't change the outcome here, but by experience I prefer

        • putting the site: last
        • searching www.PM.org , since PM.org is blocked by the robots.txt
        For a general approach I'd recommend testing the search engines automatically.

        On a side note: the above search also shows an xml generator output, I'd suppose that should be blocked too

        Edit

        Ehm.... Actually that xml result has one of those extra weird paths which shouldn't work!

        • https://www.perlmonks.org/index.pl/perldata.html?node=Newest Nodes XML Generator
        Looks like a bug.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        see Wikisyntax for the Monastery