Monks who have spent more time than I roaming the caverns of CPAN with a waxen taper may be able to point you to a module that'll do this, but it does look a bit specialist. However it is perfectly possible to do it by hand, as it were, and that might be a fun exercise (only in perlmonks is this thought to be fun, however).

The approach I'd take would be read the html file into a $scalar_variable and then run it through carefully devised regular expressions to boil it down to the info you want. You might prefer to read it into an @array line by line, but then you'd have to be sure they break the lines in a regular way, so I'd stick with the scalar.

Helpfully, they seem to give you dandy little section identifiers like <!--RECEIVINGSTATS--> which you could use to break it into chunks for working on. Then you would probably want to do some progressive matching first to get each row of the table into an array element, and then to break each row down into cells. For example,
push @output, $1 while $data =~ s/htm>(.*)<\/td><\/tr>//;
I'm intentionally not insulting your intelligence by spelling it out too much, but if you ran that on the right chunk of data you'd only have to hack out the </a> tags and then split each element of @output on <td></td> and you'll have an array of arrays of... incomprehensible info (I speak as an Englishman who, insofar as he takes any interest in sport, follows cricket).

I suggest you give it a whirl and if you get it to work let us know; or if you get some way and get stuck, post your code, with as many thoughts as you have about where you go wrong, and you'll find people glad to help you get the rest of the way.

Also I suggest you check back here in a day to find out that someone tells you my way of doing it is a waste of your time and the easy way is...

§ George Sherston

In reply to Re: Extract data from table by George_Sherston
in thread Extract data from table by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.