in reply to parsing CSV

You don't show what the page data may look like so I assume that you know how to wrangle it into raw CSV. Given that, you can match the data up by stuffing it into a hash:

use strict; use warnings; use Text::CSV; my $page1 = <<PG1CSV; 1,23 2,10 3,23 PG1CSV my $page2 = <<PG2CSV; 1,younggrasshopper13 2,GrandFather 4,Mr. Unknown PG2CSV my $csv = Text::CSV->new(); my %idData; open my $pg1In, '<', \$page1; while (my $row = $csv->getline($pg1In)) { $idData{$row->[0]}{size} = $row->[1]; $idData{$row->[0]}{name} = '-- missing --'; } close $pg1In; open my $pg2In, '<', \$page2; while (my $row = $csv->getline($pg2In)) { $idData{$row->[0]}{name} = $row->[1]; $idData{$row->[0]}{size} //= '-- missing --'; } close $pg2In; for my $id (sort keys %idData) { print "$id: $idData{$id}{name} size $idData{$id}{size}\n"; }

Prints:

1: younggrasshopper13 size 23 2: GrandFather size 10 3: -- missing -- size 23 4: Mr. Unknown size -- missing --
Premature optimization is the root of all job security

Replies are listed 'Best First'.
Re^2: parsing CSV
by younggrasshopper13 (Novice) on Oct 07, 2016 at 03:15 UTC
    The first wepage has two columns storage size and customer id. It looks like this
    512.45,c100 6734, c200 5653.2, c300
    the second web page has no column names, is a little messy and looks like this
    c100, Joe Shmo c200, Jack Black c300, Cinderella c400, Barack Obama c5 +00, Cruella Deville

    The second page is a line after line of customer data and names. no columns just line after line

      The code is pretty much the same except that the second page data gets new lines inserted in front of the id codes and we do a little clean up to remove white space at the ends of lines:

      use strict; use warnings; use Text::CSV; my $page1 = <<PG1CSV; 512.45,c100 6734, c200 5653.2, c300 PG1CSV my $csv = Text::CSV->new(); my %idData; open my $pg1In, '<', \$page1; while (my $row = $csv->getline($pg1In)) { s/^\s+|\s+$//g for @$row; $idData{$row->[1]}{size} = $row->[0]; $idData{$row->[1]}{name} = '-- missing --'; } close $pg1In; my $page2 = <<PG2CSV; c100, Joe Shmo c200, Jack Black c300, Cinderella c400, Barack Obama c5 +00, Cruella Deville PG2CSV $page2 =~ s/\b(?=\w+,)/\n/g; # Insert newlines in front of id codes open my $pg2In, '<', \$page2; while (my $row = $csv->getline($pg2In)) { next if !$row->[0]; # Skip blank lines s/^\s+|\s+$//g for @$row; $idData{$row->[0]}{name} = $row->[1]; $idData{$row->[0]}{size} //= '-- missing --'; } close $pg2In; for my $id (sort keys %idData) { print "$id: $idData{$id}{name} size $idData{$id}{size}\n"; }

      Prints:

      c100: Joe Shmo size 512.45 c200: Jack Black size 6734 c300: Cinderella size 5653.2 c400: Barack Obama size -- missing -- c500: Cruella Deville size -- missing --
      Premature optimization is the root of all job security
        Wow. This is amazing. Thank you. Now, how would be the best way to curl this into an email that sends this in data? I would need to curl a variable of some kind?
        Hey there, thanks a lot for the the code. I've been playing around with it trying to get it to work. I added the CSV webpages and dates. As well as changed the print to an output at the bottom. But no luck on this at all. Keep getting "Can't find string terminator "http" anywhere before EOF" and stuff like that.
        use strict; use warnings; use Text::CSV; START_DATE=$(date '+%Y-%m-%d' -d "-1 month"); END_DATE=$(date '+%Y-%m-%d'); my $page1 = <<http://url/website.com/thing?end_date=$END_DATE&start_da +te=$START_DATE&type=csv; 512.45,c100 6734, c200 5653.2, c300 PG1CSV my $csv = Text::CSV->new(); my %idData; open my $pg1In, '<', \$page1; while (my $row = $csv->getline($pg1In)) { s/^\s+|\s+$//g for @$row; $idData{$row->[1]}{size} = $row->[0]; $idData{$row->[1]}{name} = '-- missing --'; } close $pg1In; my $page2 = <<https:/url/website.com/thing?end_date=$END_DATE&start_da +te=$START_DATE&type=csv; c100, Joe Shmo c200, Jack Black c300, Cinderella c400, Barack Obama c5 +00, Cruella Deville PG2CSV $page2 =~ s/\b(?=\w+,)/\n/g; # Insert newlines in front of id codes open my $pg2In, '<', \$page2; while (my $row = $csv->getline($pg2In)) { next if !$row->[0]; # Skip blank lines s/^\s+|\s+$//g for @$row; $idData{$row->[0]}{name} = $row->[1]; $idData{$row->[0]}{size} //= '-- missing --'; } close $pg2In; for my $id (sort keys %idData) { $output .= "$id: $idData{$id}{name} size $idData{$id}{size}\n"; } curl -s -G "$output" | mail -s "send the thing for $END_DATE" name@nam +e.com