Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re^2: Searching the monastery with duckduckgo leads to ugly results

by LanX (Saint)
on Nov 01, 2021 at 11:59 UTC ( [id://11138291]=note: print w/replies, xml ) Need Help??


in reply to Re: Searching the monastery with duckduckgo leads to ugly results
in thread Searching the monastery with duckduckgo leads to ugly results

Thanks.

For the record:

I took a look at the robots.txt and they don't seem to do much except blocking any combination of www.com | www.net | .org to only allow www.perlmonks.org

# Please only spider http://www.perlmonks.org not http://perlmonks.org User-agent: * Disallow: /

I couldn't find any rules blocking /bare or /mobile

There are also at least two other pair domains m/qs\d+.pair.com/ showing up, which seem to be (have been?) used for development and have no robots.txt at all to block them.

edit

FWIW: there is also the separate issue of blocking ?displaytype= like xml or print , but I'm not sure if there is an accepted standard to block on /?searchstring patterns.

Update

https://en.wikipedia.org/wiki/Robots_exclusion_standard

We could also use meta-tags to disallow print versions

Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery

Replies are listed 'Best First'.
Re^3: Searching the monastery with duckduckgo leads to ugly results (WWW::RobotRules)
by LanX (Saint) on Nov 01, 2021 at 12:25 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11138291]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (3)
As of 2024-04-20 14:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found