Re^2: Extracting the few useful lines from HTML garble

Replies are listed 'Best First'.
Re^3: Extracting the few useful lines from HTML garble by wfsp (Abbot) on Aug 12, 2008 at 05:22 UTC
Here's my go with HTML::TableExtract. `#!/usr/local/bin/perl use strict; use warnings; use HTML::TableExtract; my @headers = qw{From Subject Received Size}; my $te = HTML::TableExtract->new(headers => \@headers); $te->parse_file(q{html/monk.html}) or die qq{parse failed\n}; my $ts = $te->first_table_found(); foreach my $row ($ts->rows) { for my $i (0..$#{$row}){ print qq{$headers[$i]: $row->[$i]\n}; } }` [download] `From: usernme Subject: Personal Statement - 08/09/08 Received: Sat 09/08/2008 04:25 PM Size: 124Ā KB` [download] There's a non breaking space after the size.	[reply] [d/l] [select]
Re^3: Extracting the few useful lines from HTML garble by jethro (Monsignor) on Aug 11, 2008 at 23:25 UTC
Ok. If you want the whole line, then just change my code above to (provided that "/Inbox/email.EML" is on every line. Since I can see only one line I have no idea which are the static and which the variable parts): `while (<F>) { $theline=$_ if ( m{/Inbox/email\.EML}xms ); }` [download] If this is not the answer you want, you have to be more specific By the way, I just notice that you wrote EMl in your first post, which looks exactly like EMI in my web browser. So to get my first code snippet above to work, you have to change EMI to EML there.	[reply] [d/l]