in reply to Perl Possibilities

Hey guys!

Thanks so much already for all the helpful comments you've given me. I feel that I'm already making progress on tackling my research!

So as some of you asked, I will post a small subset of text that contains the information I need. To give some context: it's from a SEC Filing that companies have to do when there are any proposals made by shareholders. As said before, I'm looking for the recommendation of the Board of Directors on these proposals. An example of a filing that I will be using can be found here:

Filing Example

The filings differ between different companies in terms of proposals etc. However, in each filing there is this recommendation that I'm looking for that is (almost) always stated in the same manner. I hope this gives some more clarity!

Thanks again so much for your help. I really appreciate it!

Replies are listed 'Best First'.
Re^2: Perl Possibilities
by Corion (Patriarch) on Mar 16, 2016 at 12:43 UTC

    As the data already is in a fairly tabular format, and in HTML, I would use HTML::TableExtractor to get at the table data. With the data in hand, it should be easy to extract the vote recommendations by looking whether FOR or AGAINST is contained in the relevant column.

      You're right about that indeed. However, the problem is that very few companies use such a table as in the example where they clearly state the proposals and their recommendations, as far as I know.

      Furthermore, I've already downloaded quite a few filings for testing purposes, and they end up being in .txt file however still formatted in html (so you see all the html code in the .txt surrounding the actual text). Would you say it's smarter to keep the .txt or convert back to .html before I do the extraction scripts?

        It won't really matter, as the relevant data basically stays the same. You will then face the problem of actually associating the proposal title with the proposed vote.

        I would look at trying to write a program that can handle some/most of the filings and that will submit the rest of the filings for a human to decide.

        "...they end up being in .txt file however still formatted in html (so you see all the html code in the .txt surrounding the actual text...."

        This shouts "I haven't bothered to understand either html or the various meanings of 'text'." The last word in the quote above uses the word "text" the sense of "textual content." The references to ".txt" refer to a file format; in this case, a document (something.html) that is comprised to ASCII or UTF8 characters.

        Since you say the html markup ("code") is still present (visible), you'll almost certainly have html formatted files by merely changing the file extension from .txt to .htm.

        But you've asked quite enough questions1 that reflect an utter lack of person effort. This is going to be your project; your thesis; and your future; not ours. So build a good foundation by taking the trouble to understand at least the basics of the relevant technology (and, as has already been suggested, understand how, when and why to seek help here).

        1   to wit, Re: Perl Possibilities, Re^3: Perl Possibilities, Re: Perl Possibilities where the link leaves the tedium of finding the material to which you refer (ANNUAL MEETING PROPOSALS ) to the Monk seeking to help. The observations here also apply to the node to which this is addressed (Re^3: Perl Possibilities) and my point is that those who seek the benefit of Monks effort should maximize their own beforehand.


        ++$anecdote ne $data