handheld-penguin has asked for the wisdom of the Perl Monks concerning the following question:

I am logging into my outlook web access account using WWW::Mechanize which works all fine...I then get the source of the page that is shown, the inbox with all my emails. What would be the easiest way to display the emails currently in my inbox? The inbox is saved automatically as a txt file on my hd and the bits that i want are of this general format.../Inbox/Name%20of%20email.EMl so a simple snippet to search for all fragments of this would do the job?
  • Comment on Extracting the few useful lines from HTML garble

Replies are listed 'Best First'.
Re: Extracting the few useful lines from HTML garble
by jethro (Monsignor) on Aug 10, 2008 at 21:27 UTC
    I guess "Name of email" is the variable part? Then this might do
    open F, '<', $yourfile or die "Couldn't open file: $!"; while (<F>) { print $1 if ( m{/Inbox/(.+)\.EMI}xms ); }
    UPDATE: Forgot closing bracket
Re: Extracting the few useful lines from HTML garble
by wfsp (Abbot) on Aug 11, 2008 at 06:44 UTC
    If would be useful if you could show a cut down/simplified example of the HTML and what "the few useful lines" contain that you need.

    If you're after, say, subject, date and from etc. these may not be all on the same line and you may need to consider a parser.

    Something like HTML::TokeParser::Simple or HTML::TreeBuilder may be more suitable than a regex especially if it's garbled. :-)

        Here's my go with HTML::TableExtract.
        #!/usr/local/bin/perl use strict; use warnings; use HTML::TableExtract; my @headers = qw{From Subject Received Size}; my $te = HTML::TableExtract->new(headers => \@headers); $te->parse_file(q{html/monk.html}) or die qq{parse failed\n}; my $ts = $te->first_table_found(); foreach my $row ($ts->rows) { for my $i (0..$#{$row}){ print qq{$headers[$i]: $row->[$i]\n}; } }
        From: usernme Subject: Personal Statement - 08/09/08 Received: Sat 09/08/2008 04:25 PM Size: 124Â KB
        There's a non breaking space after the size.

        Ok. If you want the whole line, then just change my code above to (provided that "/Inbox/email.EML" is on every line. Since I can see only one line I have no idea which are the static and which the variable parts):

        while (<F>) { $theline=$_ if ( m{/Inbox/email\.EML}xms ); }

        If this is not the answer you want, you have to be more specific

        By the way, I just notice that you wrote EMl in your first post, which looks exactly like EMI in my web browser. So to get my first code snippet above to work, you have to change EMI to EML there.