in reply to Re: parsing CSV
in thread parsing CSV

The first wepage has two columns storage size and customer id. It looks like this
512.45,c100 6734, c200 5653.2, c300
the second web page has no column names, is a little messy and looks like this
c100, Joe Shmo c200, Jack Black c300, Cinderella c400, Barack Obama c5 +00, Cruella Deville

The second page is a line after line of customer data and names. no columns just line after line

Replies are listed 'Best First'.
Re^3: parsing CSV
by GrandFather (Saint) on Oct 07, 2016 at 03:34 UTC

    The code is pretty much the same except that the second page data gets new lines inserted in front of the id codes and we do a little clean up to remove white space at the ends of lines:

    use strict; use warnings; use Text::CSV; my $page1 = <<PG1CSV; 512.45,c100 6734, c200 5653.2, c300 PG1CSV my $csv = Text::CSV->new(); my %idData; open my $pg1In, '<', \$page1; while (my $row = $csv->getline($pg1In)) { s/^\s+|\s+$//g for @$row; $idData{$row->[1]}{size} = $row->[0]; $idData{$row->[1]}{name} = '-- missing --'; } close $pg1In; my $page2 = <<PG2CSV; c100, Joe Shmo c200, Jack Black c300, Cinderella c400, Barack Obama c5 +00, Cruella Deville PG2CSV $page2 =~ s/\b(?=\w+,)/\n/g; # Insert newlines in front of id codes open my $pg2In, '<', \$page2; while (my $row = $csv->getline($pg2In)) { next if !$row->[0]; # Skip blank lines s/^\s+|\s+$//g for @$row; $idData{$row->[0]}{name} = $row->[1]; $idData{$row->[0]}{size} //= '-- missing --'; } close $pg2In; for my $id (sort keys %idData) { print "$id: $idData{$id}{name} size $idData{$id}{size}\n"; }

    Prints:

    c100: Joe Shmo size 512.45 c200: Jack Black size 6734 c300: Cinderella size 5653.2 c400: Barack Obama size -- missing -- c500: Cruella Deville size -- missing --
    Premature optimization is the root of all job security
      Wow. This is amazing. Thank you. Now, how would be the best way to curl this into an email that sends this in data? I would need to curl a variable of some kind?

        I'd use MIME::Lite (despite the "Wait!" warning paragraph) especially if all you want is a text only email without attachments.

        Premature optimization is the root of all job security
      Hey there, thanks a lot for the the code. I've been playing around with it trying to get it to work. I added the CSV webpages and dates. As well as changed the print to an output at the bottom. But no luck on this at all. Keep getting "Can't find string terminator "http" anywhere before EOF" and stuff like that.
      use strict; use warnings; use Text::CSV; START_DATE=$(date '+%Y-%m-%d' -d "-1 month"); END_DATE=$(date '+%Y-%m-%d'); my $page1 = <<http://url/website.com/thing?end_date=$END_DATE&start_da +te=$START_DATE&type=csv; 512.45,c100 6734, c200 5653.2, c300 PG1CSV my $csv = Text::CSV->new(); my %idData; open my $pg1In, '<', \$page1; while (my $row = $csv->getline($pg1In)) { s/^\s+|\s+$//g for @$row; $idData{$row->[1]}{size} = $row->[0]; $idData{$row->[1]}{name} = '-- missing --'; } close $pg1In; my $page2 = <<https:/url/website.com/thing?end_date=$END_DATE&start_da +te=$START_DATE&type=csv; c100, Joe Shmo c200, Jack Black c300, Cinderella c400, Barack Obama c5 +00, Cruella Deville PG2CSV $page2 =~ s/\b(?=\w+,)/\n/g; # Insert newlines in front of id codes open my $pg2In, '<', \$page2; while (my $row = $csv->getline($pg2In)) { next if !$row->[0]; # Skip blank lines s/^\s+|\s+$//g for @$row; $idData{$row->[0]}{name} = $row->[1]; $idData{$row->[0]}{size} //= '-- missing --'; } close $pg2In; for my $id (sort keys %idData) { $output .= "$id: $idData{$id}{name} size $idData{$id}{size}\n"; } curl -s -G "$output" | mail -s "send the thing for $END_DATE" name@nam +e.com
        my $page1 = <<http://url/website.com/thing?end_date=$END_DATE&start_da +te=$START_DATE&type=csv; 512.45,c100 6734, c200 5653.2, c300 PG1CSV

        What you are (incorrectly) attempting is a here document. The proper form (I'm making some assumptions about just exactly what you want) is:

        my $page1 = <<PG1CSV; http://url/website.com/thing?end_date=$END_DATE&start_date=$START_DATE +&type=csv; 512.45,c100 6734, c200 5653.2, c300 PG1CSV
        See also the discussion of here-docs in Quote and Quote-like Operators and Quote-Like Operators, both in perlop.

        Update: Do you want semicolon at the end of the
            http://url/website.com/thing?end_date...=csv;
        string | sub-string?

        Update 2: The here-doc with the label  PG2CSV is also incorrect, and in the same way.


        Give a man a fish:  <%-{-{-{-<