Re: Fetching 'http://search.cpan.org/uploads.rdf' (from cpan) LibXML error is triggered

Is this expected?

$ curl -LD - http://search.cpan.org/uploads.rdf
HTTP/1.1 301 Moved Permanently
Connection: keep-alive
Content-Length: 5
Content-Type: text/plain
Location: https://metacpan.org/recent.rdf
Cache-Control: max-age=31536000
Strict-Transport-Security: max-age=31536000; includeSubDomains
Via: 1.1 varnish, 1.1 varnish
Accept-Ranges: bytes
Age: 1292181
Date: Thu, 06 Nov 2025 00:03:16 GMT
X-Served-By: cache-fra-etou8220024-FRA, cache-yyz4534-YYZ
X-Cache: HIT, HIT
X-Cache-Hits: 606, 0
X-Timer: S1762387396.373864,VS0,VE4

HTTP/2 200
set-cookie: _fs_ch_st_FSBmUei20MqUiJb9=Aephdq6cZPgwr6cwu2UjlkQwWIqAvL1
+yxfsxYlbHcotSsvW5voanZ2qZLXp3LmvoCKcxHBrVeOl1ELcETsxUN_oeDechM414wPNI
+KKcPEkER4X7yF-6hFr2aQX2828d9zf-DYMr75ceP9YVWuJO8mRR9yR1xj17sfnyCHUX2c
+vgPz-ZjgucO9v4rKVad96rVCuOlWSLmN4EDqfEEJR5FWBsbpa3J8jOpOMc8Q1Wqk2pMts
+0_RJunCCVvUe3MBQF-ZpyL9rv5guZElmjyJKL6PBmN_envwRs0f2uLbNiHoh_lMSAnQzP
+HJQDP3DLsi2T3YgOCYfyi6QLuYYmnxN8ylriMPI2_plQvh7KVzA==; Max-Age=10; Ht
+tpOnly; Path=/
content-type: text/html; charset=utf-8
cache-control: private, no-store
accept-ranges: bytes
via: 1.1 varnish, 1.1 varnish
date: Thu, 06 Nov 2025 00:03:16 GMT
x-served-by: cache-fra-etou8220098-FRA, cache-fra-etou8220192-FRA, cac
+he-yyz4561-YYZ
x-cache: MISS, MISS
x-cache-hits: 0, 0
x-timer: S1762387397.572748,VS0,VE109
vary: Accept-Encoding
strict-transport-security: max-age=31557600

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta
      http-equiv="Content-Security-Policy"
      content="default-src 'self'; img-src 'self' data:; media-src 'se
+lf' data:; object-src 'none'; style-src 'self' 'sha256-o4vzfmmUENEg4c
+hMjjRP9EuW9ucGnGIGVdbl8d0SHQQ='; script-src 'self' 'sha256-KXex2o39zx
+tnzVWK4H5rW07g2+BlwSPtn+aguzsWkNg=';"
    />
    <link
      href="/_fs-ch-1T1wmsGaOgGaSxcX/assets/inter-var.woff2"
      rel="preload"
      as="font"
      type="font/woff2"
      crossorigin
    />
    <link href="/_fs-ch-1T1wmsGaOgGaSxcX/assets/styles.css" rel="style
+sheet" />
    <meta name="viewport" content="width=device-width, initial-scale=1
+" />
    <title>Client Challenge</title>
    <style>
      #loading-error {
        font-size: 16px;
        font-family: 'Inter', sans-serif;
        margin-top: 10px;
        margin-left: 10px;
        display: none;
      }
    </style>
  </head>
  <body>
    <noscript>
      <div class="noscript-container">
        <div class="noscript-content">
          <img
            src="/_fs-ch-1T1wmsGaOgGaSxcX/assets/errorIcon.svg"
            alt=""
            role="presentation"
            class="error-icon"
          />
          <span class="noscript-span"
            >JavaScript is disabled in your browser.</span
          >
          <p>Please enable JavaScript to proceed.</p>
        </div>
      </div>
    </noscript>
    <div id="loading-error" role="alert" aria-live="polite">
      A required part of this site couldn’t load. This may be due to a
+ browser
      extension, network issues, or browser settings. Please check you
+r
      connection, disable any ad blockers, or try using a different br
+owser.
    </div>
    <script>
      function loadScript(src) {
        return new Promise((resolve, reject) => {
          const script = document.createElement('script');
          script.onload = resolve;
          script.onerror = (event) => {
            console.error('Script load error event:', event);
            document.getElementById('loading-error').style.display = '
+block';
            loadingError.setAttribute('aria-hidden', 'false');
            reject(
              new Error(
                `Failed to load script: ${src}, Please contact the ser
+vice administrator.`
              )
            );
          };
          script.src = src;
          document.body.appendChild(script);
        });
      }

      loadScript('/_fs-ch-1T1wmsGaOgGaSxcX/errors.js')
        .then(() => {
          const script = document.createElement('script');
          script.src = '/_fs-ch-1T1wmsGaOgGaSxcX/script.js?reload=true
+';
          script.onerror = (event) => {
            console.error('Script load error event:', event);
            const errorMsg = new Error(
              `Failed to load script: ${script.src}. Please contact th
+e service administrator.`
            );
            console.error(errorMsg);
            handleScriptError();
          };
          document.body.appendChild(script);
        })
        .catch((error) => {
          console.error(error);
        });
    </script>
  </body>
</html>
[download]

Comment on Re: Fetching 'http://search.cpan.org/uploads.rdf' (from cpan) LibXML error is triggered Download Code

Replies are listed 'Best First'.
Re^2: Fetching 'http://search.cpan.org/uploads.rdf' (from cpan) LibXML error is triggered by hippo (Archbishop) on Nov 06, 2025 at 10:27 UTC
If you mean the redirect, then yes. That's been in place since search.cpan.org was shuttered back in 2018. If you mean the javascript, then perhaps. MetaCPAN put the fastly anti-LLM-bot-challenge in front of their content a few months back. It's mostly a minor annoyance until you want to do something like this. Almost all of the automated accesses to MetaCPAN now is supposed to happen via their API instead. However, I'm inclined to think that RSS/RDF endpoints on the main site should be excluded from the bot protection because after all we expect bots to hit them, don't we? Update: There's also MetaCPAN::Client. `MetaCPAN::Client->recent` returns more detail than might be necessary but at least makes that available without the need for a JS-enabled, bot-detector-defeating browser in the meantime. 🦛	[reply] [d/l]

Replies are listed 'Best First'.

Re^2: Fetching 'http://search.cpan.org/uploads.rdf' (from cpan) LibXML error is triggered
by hippo (Archbishop) on Nov 06, 2025 at 10:27 UTC

If you mean the redirect, then yes. That's been in place since search.cpan.org was shuttered back in 2018.

If you mean the javascript, then perhaps. MetaCPAN put the fastly anti-LLM-bot-challenge in front of their content a few months back. It's mostly a minor annoyance until you want to do something like this. Almost all of the automated accesses to MetaCPAN now is supposed to happen via their API instead. However, I'm inclined to think that RSS/RDF endpoints on the main site should be excluded from the bot protection because after all we expect bots to hit them, don't we?

Update: There's also MetaCPAN::Client. MetaCPAN::Client->recent returns more detail than might be necessary but at least makes that available without the need for a JS-enabled, bot-detector-defeating browser in the meantime.

🦛

[reply]
[d/l]