Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Searching the monastery with duckduckgo leads to ugly results

by LanX (Sage)
on Oct 31, 2021 at 21:37 UTC ( #11138280=monkdiscuss: print w/replies, xml ) Need Help??

DDG is my default search engine, but when I use it for searching the monastery I regularly end on bare or even mobile versions of the nodes.

ddg: schwartzian transform perlmonks

Google doesn't seem to have this problem (anymore), any chance to adjust the robot text?

google: schwartzian transform perlmonks

This node in

update

Hmm ... these results with my FF searchbar are very different from the monastery markup [ddg://...]

https://duckduckgo.com/?t=ffsb&q=schwartzian+transform++perlmonks&ia=web

Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery

Replies are listed 'Best First'.
Re: Searching the monastery with duckduckgo leads to ugly results
by kcott (Bishop) on Nov 01, 2021 at 05:12 UTC

    G'day Rolf,

    DDG is also my default search engine. I followed your "ddg:" link. Of the first six results, 1-4 & 6 had www.perlmonks.org/bare/...; the 5th was an SO link; there were no further PM links on the first page of results; I didn't look any further.

    Under the search query text field (correctly showing "schwartzian transform perlmonks") I see: "All Regions" and "Any Time".

    I did notice a minor redirection. Your link has https://duckduckgo.com/html/?q=schwartzian transform perlmonks; my address bar was showing https://html.duckduckgo.com/html/?q=schwartzian transform perlmonks.

    I'm using Firefox 93.0 (64-bit) on MS Windows 10 (with latest updates as of three days ago). I don't have any special DDG settings configured.

    Being unable to reproduce your results, I can't really comment further; however, sometimes a null result can be useful (e.g. how do your browser, platform, versions and settings differ from mine).

    — Ken

      Thanks.

      For the record:

      I took a look at the robots.txt and they don't seem to do much except blocking any combination of www.com | www.net | .org to only allow www.perlmonks.org

      # Please only spider http://www.perlmonks.org not http://perlmonks.org User-agent: * Disallow: /

      I couldn't find any rules blocking /bare or /mobile

      There are also at least two other pair domains m/qs\d+.pair.com/ showing up, which seem to be (have been?) used for development and have no robots.txt at all to block them.

      edit

      FWIW: there is also the separate issue of blocking ?displaytype= like xml or print , but I'm not sure if there is an accepted standard to block on /?searchstring patterns.

      Update

      https://en.wikipedia.org/wiki/Robots_exclusion_standard

      We could also use meta-tags to disallow print versions

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

Re: Searching the monastery with duckduckgo leads to ugly results
by Corion (Patriarch) on Nov 01, 2021 at 13:58 UTC

    I don't really understand the findings.

    Is there anything actionable here? Should we block certain? known? bots from visiting certain pages?

      Hi

      I'm still hoping/waiting for input from someone experienced.

      From what I've read so far:

      On www.perlmonks.org (the main target)
      • disallow /bare/
      • disallow /mobile/
      On qs\d+.pair.com domains
      • disallow /
      Or
      • disallow /~perl2/
      (Update: Same for vps\d+.pairvpn.com/~monkads/? and other mirror domains (???)

      In the templates for displaytype=print etc
      • Add a meta tag for robots disallow
      I guess that should do it

      AFAICS are there also possibilities for settings in .htaccess for non-html file-types.

      YMMV

      Update

      If say the settings should be for all user agents.

      I suppose Google is using some helpful heuristics already.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

Re: Searching the monastery with duckduckgo leads to ugly results (noindex)
by LanX (Sage) on Nov 01, 2021 at 16:15 UTC
Re: Searching the monastery with duckduckgo leads to ugly results
by Bod (Curate) on Nov 01, 2021 at 19:40 UTC

    I use Google for searching and regularly it returns the bare version of pages for Perl Monks

      OK I provided a patch to the "bare general container" and added a noindex meta-tag in the head.

      After application, this should be effective within days. If successful this could be extended to other page paths like mobile/ and displaytypes like print

      The gods are still free to adjust the main robots.txt then, to additionally reduce the load by spiders. :)

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

      ) applied 2021-11-02

        > noindex meta-tag in the head

        unfortunately this didn't have the expected outcome.

        bare is still shown in the search results, only these pages are not cached by the search engines anymore.

        We need to adjust the robots.txt, and this can only be done by a god.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

        according to DDG's website they are getting most of their results from Bing, which in turn is respecting noindex and robots.txt

        Search results look indeed similar (lot's of bare ATM)

        -> https://www.bing.com/search?q=schwartzian+transform+perlmonks

        So solving the problem for Bing should solve it for DDG too.

        The top result was cashed by bing in 2021-10-05, so updating may take a while.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: monkdiscuss [id://11138280]
Approved by davies
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (3)
As of 2022-05-25 20:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you prefer to work remotely?



    Results (90 votes). Check out past polls.

    Notices?