Here's my first stab.
#! perl use strict; use warnings; use HTML::TokeParser; my $p = HTML::TokeParser->new("sample.html") || die "Can't open: $!"; my %hash; while (my $t = $p->get_token){ next unless $t->[0] eq 'S' and $t->[1] eq 'th' and defined $t->[2]{'class'} and $t->[2]{'class'} eq 'DatTh'; my $key = $p->get_text('/th'); chop $key; $p->get_tag('td'); my $value = $p->get_text('/td'); $value =~ s/^\s*//; $value =~ s/\s*\[Add\]\s*$//; $hash{$key} = $value; } open OUT, '>', 'out.txt' or die; for my $key ( keys %hash ){ print OUT "$key => $hash{$key}\n" } close OUT;
produces...
Subject => PMR446 To => sigmund@fastmail.fm Date => Mon, 9 Aug 2004 10:28 AM  ( 59 mins 50 secs ago ) From => "Delboy" <delboyenterprises@hotmail.com>
simplified extract from html...
<table width="100%"> <tr align="left"> <th class="DatTh" width="0%">Date&nbsp;</th> <td class="DatTd" width="95%">Mon, 9 Aug 2004 10:28 AM &nbsp; +<small>( 59 mins 50 secs ago )</small></td> <td class="DatTd" rowspan="3" align="center" valign="top" widt +h="5%"><font size="-2"><a href="xxxxxxx">Text&nbsp;view</a><br><b> +HTML&nbsp;view</b><br><a href="xxxx">Print&nbsp;view</a></font></td> </tr> <tr align="left"> <th class="DatTh" width="0%">From&nbsp;</th> <td class="DatTd" width="95%">&quot;Delboy&quot; &lt;delboyent +erprises@hotmail.com&gt; <a title="Add addresses to your address book +" href="xxxx">[Add]</a></td> </tr> <tr align="left"> <th class="DatTh" width="0%">To&nbsp;</th> <td class="DatTd" width="95%">sigmund@fastmail.fm<a title=" +Add addresses to your address book" href="xxxxx">[Add]</a></td> </tr> <tr align="left"> <th class="DatTh" width="0%">Subject&nbsp;</th> <td class="DatTd" width="95%">PMR446</td> <td class="DatTd" align="center" width="5%"><font size="-2"><a + href="xxxx">Show&nbsp;full&nbsp;header</a></font></td> </tr> </table>

In reply to Re: Renaming html email dumps according to sender and date by wfsp
in thread Renaming html email dumps according to sender and date by Sigmund

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.