in reply to Renaming html email dumps according to sender and date
produces...#! perl use strict; use warnings; use HTML::TokeParser; my $p = HTML::TokeParser->new("sample.html") || die "Can't open: $!"; my %hash; while (my $t = $p->get_token){ next unless $t->[0] eq 'S' and $t->[1] eq 'th' and defined $t->[2]{'class'} and $t->[2]{'class'} eq 'DatTh'; my $key = $p->get_text('/th'); chop $key; $p->get_tag('td'); my $value = $p->get_text('/td'); $value =~ s/^\s*//; $value =~ s/\s*\[Add\]\s*$//; $hash{$key} = $value; } open OUT, '>', 'out.txt' or die; for my $key ( keys %hash ){ print OUT "$key => $hash{$key}\n" } close OUT;
simplified extract from html...Subject => PMR446 To => sigmund@fastmail.fm Date => Mon, 9 Aug 2004 10:28 AM ( 59 mins 50 secs ago ) From => "Delboy" <delboyenterprises@hotmail.com>
<table width="100%"> <tr align="left"> <th class="DatTh" width="0%">Date </th> <td class="DatTd" width="95%">Mon, 9 Aug 2004 10:28 AM +<small>( 59 mins 50 secs ago )</small></td> <td class="DatTd" rowspan="3" align="center" valign="top" widt +h="5%"><font size="-2"><a href="xxxxxxx">Text view</a><br><b> +HTML view</b><br><a href="xxxx">Print view</a></font></td> </tr> <tr align="left"> <th class="DatTh" width="0%">From </th> <td class="DatTd" width="95%">"Delboy" <delboyent +erprises@hotmail.com> <a title="Add addresses to your address book +" href="xxxx">[Add]</a></td> </tr> <tr align="left"> <th class="DatTh" width="0%">To </th> <td class="DatTd" width="95%">sigmund@fastmail.fm<a title=" +Add addresses to your address book" href="xxxxx">[Add]</a></td> </tr> <tr align="left"> <th class="DatTh" width="0%">Subject </th> <td class="DatTd" width="95%">PMR446</td> <td class="DatTd" align="center" width="5%"><font size="-2"><a + href="xxxx">Show full header</a></font></td> </tr> </table>
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Renaming html email dumps according to sender and date
by Mr_Jon (Monk) on Aug 09, 2004 at 18:01 UTC | |
by Sigmund (Pilgrim) on Aug 21, 2004 at 16:08 UTC | |
by Mr_Jon (Monk) on Aug 23, 2004 at 17:22 UTC |