I have a bunch of div tags in a text file as shown below. What i am trying to do is parse out the different things inside My output should be a hash which should look like. Whats the best way to do this. Can i use regex, html parsing of some sort? Thanks for your help in advance

OUTPUT VAR1 = { 'http://xxx.com/Java-Architect-Technisource-Richmond-VA_0ad856f5dc92e277ee34526dc9d3b973.html' => ['Java Architect - Technisource - Richmond, VA', 'Mon, 18 Jan 2010 02:17:16 GMT', 's degree in Information Technology or related systems... technology standards Ability to keep up with industry trends, relevant system development technologies...'] etc }

INPUT FILE

<div><a class="titlefield" title="Java Architect - Technisource - Richmond, VA" href="http://xxx.com/Java-Architect-Technisource-Richmond-VA_0ad856f5dc92e277ee34526dc9d3b973.html">Java Architect - Technisource - Richmond, VA</a> <br/><span class="datefield">Mon, 18 Jan 2010 02:17:16 GMT</span> <span class="labelfield">[Listed at Indeed.com]</span> <br/> s degree in Information Technology or related systems... technology standards Ability to keep up with industry trends, relevant system development technologies... From Dice - 18 Jan 2010 02:17:16 GMT - save job, email, more... </div>

<div><a class="titlefield" title="Senior Java Developer - Technisource - Richmond, VA" href="http://xxx.com/Senior-Java-Developer-Technisource-Richmond-VA_c21a277e3f459c5334eea3c70a364463.html">Senior Java Developer - Technisource - Richmond, VA</a> <br/><span class="datefield">Thu, 21 Jan 2010 02:06:39 GMT</span> <span class="labelfield">[Listed at Indeed.com]</span> <br/> s degree in Information Technology or related systems... architecture experience and Java enterprise knowledge (JEE) - Passion for web technologies and experience... From Technisource - 21 Jan 2010 02:06:39 GMT - save job, email, more... </div>

<div><a class="titlefield" title="Senior Java Developer - TEKsystems - Richmond, VA" href="http://xxx.com/Senior-Java-Developer-TEKsystems-Richmond-VA_4655ae7f56afc20d39daa75b1592767a.html">Senior Java Developer - TEKsystems - Richmond, VA</a> <br/><span class="datefield">Mon, 18 Jan 2010 06:13:02 GMT</span> <span class="labelfield">[Listed at Indeed.com]</span> <br/> Computer Science, Information Technology, or Business or related work experience/certification. 5+ years of relevant general Information Technology experience... From TEKSystems - 18 Jan 2010 06:13:02 GMT - save job, email, more... </div>


In reply to HTML Parsing /Regex Qstn by sri1230

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.