younggrasshopper13 has asked for the wisdom of the Perl Monks concerning the following question:

I've added this in the chatterbox but I'm trying here as well. I have two web pages that are both displaying comma separated values. One page has customer id and storage size. The other has customer id and names. I need to figure out a way to match the csv's names from one page to the others corresponding cust id
webpage1ex: c500, 345.5 webpage2ex: c500, younggrasshopper13
I figured I could curl the url into a variable like so:
START_DATE=$(date '+%Y-%m-%d' -d "-1 month") END_DATE=$(date '+%Y-%m-%d') my $data = curl `http://www.webpagedata.com?end_date=$END_DATE&start_d +ate=$START_DATE&type=csv 2>&1` my $name = curl `http://www.webpagename.com?end_date=$END_DATE&start_d +ate=$START_DATE&type=csv 2>&1`

That is just an example of what I was trying to do without giving you guys the webpages. I don't know how much I can give since it is work related. I figured once I got the curl to input the strings into variables I could somehow use the variables and match the customer Id of the first web page to the customer id of the second webpage and merge over the corresponding name. Then I need to take this output and curl it to an email that gets delivered. the email should include customer id, storage, and name

Replies are listed 'Best First'.
Re: parsing CSV
by GrandFather (Saint) on Oct 07, 2016 at 03:01 UTC

    You don't show what the page data may look like so I assume that you know how to wrangle it into raw CSV. Given that, you can match the data up by stuffing it into a hash:

    use strict; use warnings; use Text::CSV; my $page1 = <<PG1CSV; 1,23 2,10 3,23 PG1CSV my $page2 = <<PG2CSV; 1,younggrasshopper13 2,GrandFather 4,Mr. Unknown PG2CSV my $csv = Text::CSV->new(); my %idData; open my $pg1In, '<', \$page1; while (my $row = $csv->getline($pg1In)) { $idData{$row->[0]}{size} = $row->[1]; $idData{$row->[0]}{name} = '-- missing --'; } close $pg1In; open my $pg2In, '<', \$page2; while (my $row = $csv->getline($pg2In)) { $idData{$row->[0]}{name} = $row->[1]; $idData{$row->[0]}{size} //= '-- missing --'; } close $pg2In; for my $id (sort keys %idData) { print "$id: $idData{$id}{name} size $idData{$id}{size}\n"; }

    Prints:

    1: younggrasshopper13 size 23 2: GrandFather size 10 3: -- missing -- size 23 4: Mr. Unknown size -- missing --
    Premature optimization is the root of all job security
      The first wepage has two columns storage size and customer id. It looks like this
      512.45,c100 6734, c200 5653.2, c300
      the second web page has no column names, is a little messy and looks like this
      c100, Joe Shmo c200, Jack Black c300, Cinderella c400, Barack Obama c5 +00, Cruella Deville

      The second page is a line after line of customer data and names. no columns just line after line

        The code is pretty much the same except that the second page data gets new lines inserted in front of the id codes and we do a little clean up to remove white space at the ends of lines:

        use strict; use warnings; use Text::CSV; my $page1 = <<PG1CSV; 512.45,c100 6734, c200 5653.2, c300 PG1CSV my $csv = Text::CSV->new(); my %idData; open my $pg1In, '<', \$page1; while (my $row = $csv->getline($pg1In)) { s/^\s+|\s+$//g for @$row; $idData{$row->[1]}{size} = $row->[0]; $idData{$row->[1]}{name} = '-- missing --'; } close $pg1In; my $page2 = <<PG2CSV; c100, Joe Shmo c200, Jack Black c300, Cinderella c400, Barack Obama c5 +00, Cruella Deville PG2CSV $page2 =~ s/\b(?=\w+,)/\n/g; # Insert newlines in front of id codes open my $pg2In, '<', \$page2; while (my $row = $csv->getline($pg2In)) { next if !$row->[0]; # Skip blank lines s/^\s+|\s+$//g for @$row; $idData{$row->[0]}{name} = $row->[1]; $idData{$row->[0]}{size} //= '-- missing --'; } close $pg2In; for my $id (sort keys %idData) { print "$id: $idData{$id}{name} size $idData{$id}{size}\n"; }

        Prints:

        c100: Joe Shmo size 512.45 c200: Jack Black size 6734 c300: Cinderella size 5653.2 c400: Barack Obama size -- missing -- c500: Cruella Deville size -- missing --
        Premature optimization is the root of all job security
Re: parsing CSV
by GrandFather (Saint) on Oct 09, 2016 at 10:04 UTC

    This doesn't seem to be getting very far very fast. The following puts all the pieces together, albeit using data from Re^2: parsing CSV rather than the real data. The parsing and clean up will no doubt need to be different for the real data. This just pulls out the first two pre tags from one page rather than fetching two pages and doing whatever is needed to pull out the interesting content.

    use strict; use warnings; use MIME::Lite; use LWP::Simple; use Text::CSV; use HTML::TreeBuilder; # Fetch the "pages" my $content = get("http://perlmonks.org/?node_id=1173447"); die "Couldn't get it!" unless defined $content; # Parse pages and clean up content my $root = HTML::TreeBuilder->new_from_content($content); my ($page1, $page2) = map {$_->as_text()} $root->find_by_tag_name('pre +'); s/\[download\]//g for $page1, $page2; s/\n\+//g for $page1, $page2; # Process page 1 my $csv = Text::CSV->new(); my %idData; open my $pg1In, '<', \$page1; while (my $row = $csv->getline($pg1In)) { s/^\s+|\s+$//g for @$row; $idData{$row->[1]}{size} = $row->[0]; $idData{$row->[1]}{name} = '-- missing --'; } close $pg1In; # Process page 2 $page2 =~ s/\b(?=\w+,)/\n/g; # Insert newlines in front of id codes open my $pg2In, '<', \$page2; while (my $row = $csv->getline($pg2In)) { next if !$row->[0]; # Skip blank lines s/^\s+|\s+$//g for @$row; $idData{$row->[0]}{name} = $row->[1]; $idData{$row->[0]}{size} //= '-- missing --'; } close $pg2In; # Generate output string my $output; for my $id (sort keys %idData) { $output .= "$id: $idData{$id}{name} size $idData{$id}{size}\n"; } # Build the email my $msg = MIME::Lite->new( From => 'me@myhost.com', To => 'you@yourhost.com', Cc => 'some@other.com, some@more.com', Subject => "Here's the data you wanted", Data => $output ); # and "send" it (just '$msg->send()' in the next line to really send i +t print $msg->as_string();

    Prints:

    Content-Disposition: inline Content-Transfer-Encoding: 8bit Content-Type: text/plain MIME-Version: 1.0 X-Mailer: MIME::Lite 3.030 (F2.85; T2.13; A2.16; B3.15; Q3.13) Date: Sun, 9 Oct 2016 22:55:39 +1300 From: me@myhost.com To: you@yourhost.com Cc: some@other.com, some@more.com Subject: Here's the data you wanted c100: Joe Shmo size 512.45 c200: Jack Black size 6734 c300: Cinderella size 5653.2 c400: Barack Obama size -- missing -- c500: Cruella Deville size -- missing --

    I suggest you leave the print line in until the body of the email looks right before you change it to the send line.

    Premature optimization is the root of all job security