This doesn't seem to be getting very far very fast. The following puts all the pieces together, albeit using data from Re^2: parsing CSV rather than the real data. The parsing and clean up will no doubt need to be different for the real data. This just pulls out the first two pre tags from one page rather than fetching two pages and doing whatever is needed to pull out the interesting content.

use strict; use warnings; use MIME::Lite; use LWP::Simple; use Text::CSV; use HTML::TreeBuilder; # Fetch the "pages" my $content = get("http://perlmonks.org/?node_id=1173447"); die "Couldn't get it!" unless defined $content; # Parse pages and clean up content my $root = HTML::TreeBuilder->new_from_content($content); my ($page1, $page2) = map {$_->as_text()} $root->find_by_tag_name('pre +'); s/\[download\]//g for $page1, $page2; s/\n\+//g for $page1, $page2; # Process page 1 my $csv = Text::CSV->new(); my %idData; open my $pg1In, '<', \$page1; while (my $row = $csv->getline($pg1In)) { s/^\s+|\s+$//g for @$row; $idData{$row->[1]}{size} = $row->[0]; $idData{$row->[1]}{name} = '-- missing --'; } close $pg1In; # Process page 2 $page2 =~ s/\b(?=\w+,)/\n/g; # Insert newlines in front of id codes open my $pg2In, '<', \$page2; while (my $row = $csv->getline($pg2In)) { next if !$row->[0]; # Skip blank lines s/^\s+|\s+$//g for @$row; $idData{$row->[0]}{name} = $row->[1]; $idData{$row->[0]}{size} //= '-- missing --'; } close $pg2In; # Generate output string my $output; for my $id (sort keys %idData) { $output .= "$id: $idData{$id}{name} size $idData{$id}{size}\n"; } # Build the email my $msg = MIME::Lite->new( From => 'me@myhost.com', To => 'you@yourhost.com', Cc => 'some@other.com, some@more.com', Subject => "Here's the data you wanted", Data => $output ); # and "send" it (just '$msg->send()' in the next line to really send i +t print $msg->as_string();

Prints:

Content-Disposition: inline Content-Transfer-Encoding: 8bit Content-Type: text/plain MIME-Version: 1.0 X-Mailer: MIME::Lite 3.030 (F2.85; T2.13; A2.16; B3.15; Q3.13) Date: Sun, 9 Oct 2016 22:55:39 +1300 From: me@myhost.com To: you@yourhost.com Cc: some@other.com, some@more.com Subject: Here's the data you wanted c100: Joe Shmo size 512.45 c200: Jack Black size 6734 c300: Cinderella size 5653.2 c400: Barack Obama size -- missing -- c500: Cruella Deville size -- missing --

I suggest you leave the print line in until the body of the email looks right before you change it to the send line.

Premature optimization is the root of all job security

In reply to Re: parsing CSV by GrandFather
in thread parsing CSV by younggrasshopper13

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.