Re: parsing CSV

You don't show what the page data may look like so I assume that you know how to wrangle it into raw CSV. Given that, you can match the data up by stuffing it into a hash:

use strict;
use warnings;
use Text::CSV;

my $page1 = <<PG1CSV;
1,23
2,10
3,23
PG1CSV

my $page2 = <<PG2CSV;
1,younggrasshopper13
2,GrandFather
4,Mr. Unknown
PG2CSV

my $csv = Text::CSV->new();
my %idData;

open my $pg1In, '<', \$page1;

while (my $row = $csv->getline($pg1In)) {
    $idData{$row->[0]}{size} = $row->[1];
    $idData{$row->[0]}{name} = '-- missing --';
}

close $pg1In;

open my $pg2In, '<', \$page2;

while (my $row = $csv->getline($pg2In)) {
    $idData{$row->[0]}{name} = $row->[1];
    $idData{$row->[0]}{size} //= '-- missing --';
}

close $pg2In;

for my $id (sort keys %idData) {
    print "$id: $idData{$id}{name} size $idData{$id}{size}\n";
}
[download]

Prints:

1: younggrasshopper13 size 23
2: GrandFather size 10
3: -- missing -- size 23
4: Mr. Unknown size -- missing --
[download]

Premature optimization is the root of all job security

Comment on Re: parsing CSV Select or Download Code

Replies are listed 'Best First'.
Re^2: parsing CSV by younggrasshopper13 (Novice) on Oct 07, 2016 at 03:15 UTC
The first wepage has two columns storage size and customer id. It looks like this `512.45,c100 6734, c200 5653.2, c300` [download] the second web page has no column names, is a little messy and looks like this `c100, Joe Shmo c200, Jack Black c300, Cinderella c400, Barack Obama c5 +00, Cruella Deville` [download] The second page is a line after line of customer data and names. no columns just line after line	[reply] [d/l] [select]
Re^3: parsing CSV by GrandFather (Saint) on Oct 07, 2016 at 03:34 UTC
The code is pretty much the same except that the second page data gets new lines inserted in front of the id codes and we do a little clean up to remove white space at the ends of lines: use strict; use warnings; use Text::CSV; my $page1 = <<PG1CSV; 512.45,c100 6734, c200 5653.2, c300 PG1CSV my $csv = Text::CSV->new(); my %idData; open my $pg1In, '<', \$page1; while (my $row = $csv->getline($pg1In)) { s/^\s+\|\s+$//g for @$row; $idData{$row->[1]}{size} = $row->[0]; $idData{$row->[1]}{name} = '-- missing --'; } close $pg1In; my $page2 = <<PG2CSV; c100, Joe Shmo c200, Jack Black c300, Cinderella c400, Barack Obama c5 +00, Cruella Deville PG2CSV $page2 =~ s/\b(?=\w+,)/\n/g; # Insert newlines in front of id codes open my $pg2In, '<', \$page2; while (my $row = $csv->getline($pg2In)) { next if !$row->[0]; # Skip blank lines s/^\s+\|\s+$//g for @$row; $idData{$row->[0]}{name} = $row->[1]; $idData{$row->[0]}{size} //= '-- missing --'; } close $pg2In; for my $id (sort keys %idData) { print "$id: $idData{$id}{name} size $idData{$id}{size}\n"; } [download] Prints: `c100: Joe Shmo size 512.45 c200: Jack Black size 6734 c300: Cinderella size 5653.2 c400: Barack Obama size -- missing -- c500: Cruella Deville size -- missing --` [download] Premature optimization is the root of all job security	[reply] [d/l] [select]
Re^4: parsing CSV by younggrasshopper13 (Novice) on Oct 07, 2016 at 05:01 UTC
Wow. This is amazing. Thank you. Now, how would be the best way to curl this into an email that sends this in data? I would need to curl a variable of some kind?	[reply]
Re^5: parsing CSV by GrandFather (Saint) on Oct 07, 2016 at 06:23 UTC
Re^6: parsing CSV by younggrasshopper13 (Novice) on Oct 07, 2016 at 07:03 UTC
Some notes below your chosen depth have not been shown here
Re^4: parsing CSV by younggrasshopper13 (Novice) on Oct 08, 2016 at 02:15 UTC
Hey there, thanks a lot for the the code. I've been playing around with it trying to get it to work. I added the CSV webpages and dates. As well as changed the print to an output at the bottom. But no luck on this at all. Keep getting "Can't find string terminator "http" anywhere before EOF" and stuff like that. use strict; use warnings; use Text::CSV; START_DATE=$(date '+%Y-%m-%d' -d "-1 month"); END_DATE=$(date '+%Y-%m-%d'); my $page1 = <<http://url/website.com/thing?end_date=$END_DATE&start_da +te=$START_DATE&type=csv; 512.45,c100 6734, c200 5653.2, c300 PG1CSV my $csv = Text::CSV->new(); my %idData; open my $pg1In, '<', \$page1; while (my $row = $csv->getline($pg1In)) { s/^\s+\|\s+$//g for @$row; $idData{$row->[1]}{size} = $row->[0]; $idData{$row->[1]}{name} = '-- missing --'; } close $pg1In; my $page2 = <<https:/url/website.com/thing?end_date=$END_DATE&start_da +te=$START_DATE&type=csv; c100, Joe Shmo c200, Jack Black c300, Cinderella c400, Barack Obama c5 +00, Cruella Deville PG2CSV $page2 =~ s/\b(?=\w+,)/\n/g; # Insert newlines in front of id codes open my $pg2In, '<', \$page2; while (my $row = $csv->getline($pg2In)) { next if !$row->[0]; # Skip blank lines s/^\s+\|\s+$//g for @$row; $idData{$row->[0]}{name} = $row->[1]; $idData{$row->[0]}{size} //= '-- missing --'; } close $pg2In; for my $id (sort keys %idData) { $output .= "$id: $idData{$id}{name} size $idData{$id}{size}\n"; } curl -s -G "$output" \| mail -s "send the thing for $END_DATE" name@nam +e.com [download]	[reply] [d/l]
Re^5: parsing CSV by AnomalousMonk (Archbishop) on Oct 08, 2016 at 03:13 UTC
Re^6: parsing CSV by younggrasshopper13 (Novice) on Oct 08, 2016 at 17:42 UTC
Some notes below your chosen depth have not been shown here