Well the first thing you need to do when it comes to parsing data records from HTML documents, is to look at the markup structure of the document and determine the simplest "search rule" that matches all data records (without matching any false positives).

If I understand your post correctly, each record in your case will be of the form <div class="message1">...</div>. But that's not enough information to determine the "search rule" for matching all of them, you need to look at what exactly changes from record to record, and at the page structure they're embedded in.
Here are some examples of what the "search rule" could be, depending on the exact document structure at hand (sorted from simpler to more complex):

Once you have determined the simplest rule for matching records (and no false positives!) for your particular use-case, you can then start thinking about how to implement the parsing. Report back once you are at that stage and need more help with that.

PS: As for formatting questions on PerlMonks, it's best to put code (including HTML markup) in <code>...</code> tags - among other benefits you can then keep the angled brackets in the code, they will show up verbatim.


In reply to Re: HTML::Parser guidance by smls
in thread HTML::Parser guidance by 4perl

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.