Intrepid has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks and Nuns. I was working with cpan (the installer) on CygPerl today and tried the command recent; it didn't work and this is what I saw:


cpan2> recent
CPAN: XML::LibXML loaded ok (v2.0210)
Fetching 'http://search.cpan.org/uploads.rdf'
CPAN: LWP loaded ok (v6.80)
DONE

Catching error: "XML::LibXML::Error=HASH(0xa01ebe148)" at /usr/local/share/perl5/site_perl/5.40/CPAN.pm line 397.
        CPAN::shell() called at -e line 1

Could anyone else try this and see what happens? I'd appreciate it.

My system setup is:

This is perl 5, version 40, subversion 3 (v5.40.3) built for x86_64-cygwin-threads-multi
/usr/lib/perl5/vendor_perl/5.40/x86_64-cygwin-threads/XML/LibXML.pm
XML::LibXML version 2.0210
CPAN        version 2.38
@INC:
/usr/local/lib/perl5/site_perl/5.40/x86_64-cygwin-threads
/usr/local/share/perl5/site_perl/5.40
/usr/lib/perl5/vendor_perl/5.40/x86_64-cygwin-threads
/usr/share/perl5/vendor_perl/5.40
/usr/lib/perl5/5.40/x86_64-cygwin-threads
/usr/share/perl5/5.40
Nov 05, 2025 at 18:08 UTC

A just machine to make big decisions
Programmed by fellows (and gals) with compassion and vision
We'll be clean when their work is done
We'll be eternally free yes, and eternally young
Donald Fagen —> I.G.Y.
(Slightly modified for inclusiveness)

  • Comment on Fetching 'http://search.cpan.org/uploads.rdf' (from cpan) LibXML error is triggered

Replies are listed 'Best First'.
Re: Fetching 'http://search.cpan.org/uploads.rdf' (from cpan) LibXML error is triggered
by Haarg (Priest) on Nov 06, 2025 at 12:47 UTC
    I've updated MetaCPAN to allow access to https://metacpan.org/recent.rdf using mechanized clients without needing a JavaScript challenge. This should fix the CPAN client's recent command.
Re: Fetching 'http://search.cpan.org/uploads.rdf' (from cpan) LibXML error is triggered
by choroba (Cardinal) on Nov 05, 2025 at 18:42 UTC
    CPAN is not ready for exception objects.

    I modified the source of CPAN.pm by adding

    use Data::Dumper; print STDERR Dumper($err);
    right before line 397.

    This was the output:

    $VAR1 = bless( { 'context' => ' />', 'message' => 'Specification mandates value for attrib +ute crossorigin ', 'num2' => 5, '_prev' => undef, 'code' => 41, 'str1' => 'crossorigin', 'level' => 3, 'str2' => undef, 'str3' => undef, 'file' => '', 'num1' => 0, 'domain' => 1, 'column' => 4, '__prev_depth' => 0, 'line' => 14 }, 'XML::LibXML::Error' );

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
Re: Fetching 'http://search.cpan.org/uploads.rdf' (from cpan) LibXML error is triggered
by ikegami (Patriarch) on Nov 06, 2025 at 00:04 UTC

    Is this expected?

    $ curl -LD - http://search.cpan.org/uploads.rdf HTTP/1.1 301 Moved Permanently Connection: keep-alive Content-Length: 5 Content-Type: text/plain Location: https://metacpan.org/recent.rdf Cache-Control: max-age=31536000 Strict-Transport-Security: max-age=31536000; includeSubDomains Via: 1.1 varnish, 1.1 varnish Accept-Ranges: bytes Age: 1292181 Date: Thu, 06 Nov 2025 00:03:16 GMT X-Served-By: cache-fra-etou8220024-FRA, cache-yyz4534-YYZ X-Cache: HIT, HIT X-Cache-Hits: 606, 0 X-Timer: S1762387396.373864,VS0,VE4 HTTP/2 200 set-cookie: _fs_ch_st_FSBmUei20MqUiJb9=Aephdq6cZPgwr6cwu2UjlkQwWIqAvL1 +yxfsxYlbHcotSsvW5voanZ2qZLXp3LmvoCKcxHBrVeOl1ELcETsxUN_oeDechM414wPNI +KKcPEkER4X7yF-6hFr2aQX2828d9zf-DYMr75ceP9YVWuJO8mRR9yR1xj17sfnyCHUX2c +vgPz-ZjgucO9v4rKVad96rVCuOlWSLmN4EDqfEEJR5FWBsbpa3J8jOpOMc8Q1Wqk2pMts +0_RJunCCVvUe3MBQF-ZpyL9rv5guZElmjyJKL6PBmN_envwRs0f2uLbNiHoh_lMSAnQzP +HJQDP3DLsi2T3YgOCYfyi6QLuYYmnxN8ylriMPI2_plQvh7KVzA==; Max-Age=10; Ht +tpOnly; Path=/ content-type: text/html; charset=utf-8 cache-control: private, no-store accept-ranges: bytes via: 1.1 varnish, 1.1 varnish date: Thu, 06 Nov 2025 00:03:16 GMT x-served-by: cache-fra-etou8220098-FRA, cache-fra-etou8220192-FRA, cac +he-yyz4561-YYZ x-cache: MISS, MISS x-cache-hits: 0, 0 x-timer: S1762387397.572748,VS0,VE109 vary: Accept-Encoding strict-transport-security: max-age=31557600 <!DOCTYPE html> <html lang="en"> <head> <meta http-equiv="Content-Security-Policy" content="default-src 'self'; img-src 'self' data:; media-src 'se +lf' data:; object-src 'none'; style-src 'self' 'sha256-o4vzfmmUENEg4c +hMjjRP9EuW9ucGnGIGVdbl8d0SHQQ='; script-src 'self' 'sha256-KXex2o39zx +tnzVWK4H5rW07g2+BlwSPtn+aguzsWkNg=';" /> <link href="/_fs-ch-1T1wmsGaOgGaSxcX/assets/inter-var.woff2" rel="preload" as="font" type="font/woff2" crossorigin /> <link href="/_fs-ch-1T1wmsGaOgGaSxcX/assets/styles.css" rel="style +sheet" /> <meta name="viewport" content="width=device-width, initial-scale=1 +" /> <title>Client Challenge</title> <style> #loading-error { font-size: 16px; font-family: 'Inter', sans-serif; margin-top: 10px; margin-left: 10px; display: none; } </style> </head> <body> <noscript> <div class="noscript-container"> <div class="noscript-content"> <img src="/_fs-ch-1T1wmsGaOgGaSxcX/assets/errorIcon.svg" alt="" role="presentation" class="error-icon" /> <span class="noscript-span" >JavaScript is disabled in your browser.</span > <p>Please enable JavaScript to proceed.</p> </div> </div> </noscript> <div id="loading-error" role="alert" aria-live="polite"> A required part of this site couldn’t load. This may be due to a + browser extension, network issues, or browser settings. Please check you +r connection, disable any ad blockers, or try using a different br +owser. </div> <script> function loadScript(src) { return new Promise((resolve, reject) => { const script = document.createElement('script'); script.onload = resolve; script.onerror = (event) => { console.error('Script load error event:', event); document.getElementById('loading-error').style.display = ' +block'; loadingError.setAttribute('aria-hidden', 'false'); reject( new Error( `Failed to load script: ${src}, Please contact the ser +vice administrator.` ) ); }; script.src = src; document.body.appendChild(script); }); } loadScript('/_fs-ch-1T1wmsGaOgGaSxcX/errors.js') .then(() => { const script = document.createElement('script'); script.src = '/_fs-ch-1T1wmsGaOgGaSxcX/script.js?reload=true +'; script.onerror = (event) => { console.error('Script load error event:', event); const errorMsg = new Error( `Failed to load script: ${script.src}. Please contact th +e service administrator.` ); console.error(errorMsg); handleScriptError(); }; document.body.appendChild(script); }) .catch((error) => { console.error(error); }); </script> </body> </html>

      If you mean the redirect, then yes. That's been in place since search.cpan.org was shuttered back in 2018.

      If you mean the javascript, then perhaps. MetaCPAN put the fastly anti-LLM-bot-challenge in front of their content a few months back. It's mostly a minor annoyance until you want to do something like this. Almost all of the automated accesses to MetaCPAN now is supposed to happen via their API instead. However, I'm inclined to think that RSS/RDF endpoints on the main site should be excluded from the bot protection because after all we expect bots to hit them, don't we?

      Update: There's also MetaCPAN::Client. MetaCPAN::Client->recent returns more detail than might be necessary but at least makes that available without the need for a JS-enabled, bot-detector-defeating browser in the meantime.


      🦛