in reply to Re: Extracting the few useful lines from HTML garble
in thread Extracting the few useful lines from HTML garble

Here is the html in question...everything i want is one big line at 75.. http://pastebin.com/mf0291b0
  • Comment on Re^2: Extracting the few useful lines from HTML garble

Replies are listed 'Best First'.
Re^3: Extracting the few useful lines from HTML garble
by wfsp (Abbot) on Aug 12, 2008 at 05:22 UTC
    Here's my go with HTML::TableExtract.
    #!/usr/local/bin/perl use strict; use warnings; use HTML::TableExtract; my @headers = qw{From Subject Received Size}; my $te = HTML::TableExtract->new(headers => \@headers); $te->parse_file(q{html/monk.html}) or die qq{parse failed\n}; my $ts = $te->first_table_found(); foreach my $row ($ts->rows) { for my $i (0..$#{$row}){ print qq{$headers[$i]: $row->[$i]\n}; } }
    From: usernme Subject: Personal Statement - 08/09/08 Received: Sat 09/08/2008 04:25 PM Size: 124Â KB
    There's a non breaking space after the size.
Re^3: Extracting the few useful lines from HTML garble
by jethro (Monsignor) on Aug 11, 2008 at 23:25 UTC

    Ok. If you want the whole line, then just change my code above to (provided that "/Inbox/email.EML" is on every line. Since I can see only one line I have no idea which are the static and which the variable parts):

    while (<F>) { $theline=$_ if ( m{/Inbox/email\.EML}xms ); }

    If this is not the answer you want, you have to be more specific

    By the way, I just notice that you wrote EMl in your first post, which looks exactly like EMI in my web browser. So to get my first code snippet above to work, you have to change EMI to EML there.