I am not an expert on SEC filings or the Edgar database, but after a few minutes of poking around the README file and docs directory it looks like everything would still be simpler with ftp unless you are a screen-scraping wizard.

As I have just learned, each company that files with the SEC has a CIK number. In your example, the CIK number for IBM is 51143. All of IBM's filings live in that directory ftp://ftp.sec.gov/edgar/data/51143. From the explanation in the README it seems that prior to Edgar 7.0 (starting in the year 2000) all of each company's filings were stored in one directory, but because of problems with overwriting documents when ammendments were submitted, everything is now stored in sub directories based on the accession number of the documents. This might be why it appears that information is per-day. Fortunately, it looks like the SEC provides an index of filings for each quarter by company name or type of filing so you don't have to mess around with slogging through every sub-directory to find the information you need. The only benefit I see from the http interface in your example is that your search focused on filings related to change of ownership. However, I gather that these are related to a known subset of forms (4, K-8) and this information is available in the index, so you could subset the information yourself.

So, if it were me I would probably move forward in two stages using Net::ftp to 1) grab indices and subsetting the records I need based on company and filing-type to create a list of the files I want to get, and then 2) and grab those files with Net::ftp again.


In reply to Re^3: Making an array from a downloaded web page by moklevat
in thread Making an array from a downloaded web page by malomar66

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.