Have you checked the TOS for that site?

When I tried the script you showed, it gave me a 403 Forbidden error. When I checked with Chrome, it downloaded fine. When I tried a curl -v https://www.sec.gov/Archives/edgar/full-index/2019/QTR1/master.idx, it was a bit more specific:

< HTTP/1.1 403 Forbidden < Server: AkamaiGHost < Mime-Version: 1.0 < Content-Length: 4793 < Cache-Control: no-cache, no-store, must-revalidate < Pragma: no-cache < Expires: 0 < Content-Type: text/html < Date: Sat, 21 May 2022 20:40:48 GMT < Connection: keep-alive < Strict-Transport-Security: max-age=31536000 ; includeSubDomains ; pr +eload ... <title>SEC.gov | Request Rate Threshold Exceeded</title> ... <h1>Your Request Originates from an Undeclared Automated Tool</h1> <p>To allow for equitable access to all users, SEC reserves the right +to limit requests originating from undeclared automated tools. Your r +equest has been identified as part of a network of automated tools ou +tside of the acceptable policy and will be managed until action is ta +ken to declare your traffic.</p> <p>Please declare your traffic by updating your user agent to include +company specific information.</p> ... <p>For best practices on efficiently downloading information from SEC. +gov, including the latest EDGAR filings, visit <a href="https://www.s +ec.gov/developer" target="_blank">sec.gov/developer</a>. You can also + <a href="https://public.govdelivery.com/accounts/USSEC/subscriber/ne +w?topic_id=USSEC_260" target="_blank">sign up for email updates</a> o +n the SEC open data program, including best practices that make it mo +re efficient to download data, and SEC.gov enhancements that may impa +ct scripted downloading processes. For more information, contact <a h +ref="mailto:opendata@sec.gov">opendata@sec.gov</a>.</p> <p>For more information, please see the SEC’s <a href="#internet">Web +Site Privacy and Security Policy</a>. Thank you for your interest in +the U.S. Securities and Exchange Commission. <p>Reference ID: 0.9db31bb8.1653165648.37b3e960</p>

Basically, you need to make sure you are following their TOS in terms of load limits, and define a user-agent string that meets their rules. (Or if you want to risk violating the SEC's rules, use a user-agent string that mimics a browser's string without looking up what their rules are ↗). Both LWP::UserAgent and WWW::Mechanize allow setting the user agent, and document how to do so.


↗: Looks like LanX determined that wouldn't work in id://11144056, which wasn't there when I started writing my post.
edit 2: you could have seen the full error message yourself if you had checked for content as well as status during the else condition, like else {die $response->status_line . ($response->content||'');}

In reply to Re: LWP and Mechanize by pryrt
in thread LWP and Mechanize by perlmike

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.