JobC has asked for the wisdom of the Perl Monks concerning the following question:

I have some CSV data like this:

textbox84,textbox95,textbox94,textbox85,textbox86,textbox92,Textbox17, +Textbox18 1,1,ea,"456.321.07","TRAY, BOTTOM, 66#47L-20503",2/9/2016,Production I +tem: 1234,WO # : W00007382 2,1,ea,"456.321.12","TRAY, TOP, 66#51R-20507",2/9/2016,Production Item +: 1234,WO # : W00007382 3,1,ea,"456.321.15","TUBE, TALL, 66#15CR-21862",2/9/2016,Production It +em: 1234,WO # : W00007382 4,2,ea,"460.339.01","WRAP, LGE GRP, xx#03CR-28081",2/9/2016,Production + Item: qwe2b,WO # : W00007432 5,4,ea,"460.339.02","TRAY LGE GRP, xx#49R-27709",2/9/2016,Production I +tem: qwe2b,WO # : W00007432 6,2,ea,"460.339.01","WRAP, LGE GRP, xx#03CR-28081",2/9/2016,Production + Item: qze9b,WO # : W00007442 7,4,ea,"460.339.02","TRAY LGE GRP, xx#49R-27709",2/9/2016,Production I +tem: qze9b,WO # : W00007442 8,2,ea,"460.339.03","WRAP, SM GRP, xx#02CR-28079",2/9/2016,Production +Item: tyu6B,WO # : W00007460 9,4,ea,"460.339.04","TRAY, SM GRP, xx#50R-27711",2/9/2016,Production I +tem: tyu6B,WO # : W00007460 10,2,ea,"460.339.01","WRAP, LGE GRP, xx#03CR-28081",2/9/2016,Productio +n Item: ert9b,WO # : W00007469 11,4,ea,"460.339.02","TRAY LGE GRP, xx#49R-27709",2/9/2016,Production +Item: ert9b,WO # : W00007469 12,2,ea,"460.339.01","WRAP, LGE GRP, xx#03CR-28081",2/9/2016,Productio +n Item: YUTBN,WO # : W00007483 13,4,ea,"460.339.02","TRAY LGE GRP, xx#49R-27709",2/9/2016,Production +Item: YUTBN,WO # : W00007483 14,2,ea,"456.325.30","CNT LONG BOX BTM#08CR-23399",2/9/2016,Production + Item: 3323,WO # : W00007430 15,2,ea,"456.325.31","CNT LONG BOX TOP#27L-23400",2/9/2016,Production +Item: 3323,WO # : W00007430 16,2,ea,"456.123.00"," KIT ASSY BOX#F29487",2/9/2016,Production Item: +WDKSC,WO # : W00007497 17,2,ea,"456.147.00"," KIT ASSY BOX 2, SC#F29494",2/9/2016,Production +Item: WDKSC,WO # : W00007497 18,2,ea,"456.325.11","BOX, 34/43"", WMT UPR#11CR-20517",2/9/2016,Produ +ction Item: 9876,WO # : W00007506 19,2,ea,"456.325.15","BOX, 34""/42"" WMT UPR, BTM#13CR-21227",2/9/2016 +,Production Item: 9876,WO # : W00007506 20,2,ea,"456.321.07","TRAY, BOTTOM, 66""#47L-20503",2/9/2016,Productio +n Item: 9976,WO # : W00007519 21,2,ea,"456.321.08","TUBE, MEDIUM/SHORT, 66#14CR-20539",2/9/2016,Prod +uction Item: 9976,WO # : W00007519 22,2,ea,"456.321.09","TOP SPRT, 66#12CR-20504",2/9/2016,Production Ite +m: 9976,WO # : W00007519 23,2,ea,"456.321.12","TRAY, TOP, 66#51R-20507",2/9/2016,Production Ite +m: 9976,WO # : W00007519 24,4,ea,"456.321.11","BRACE, MEDIUM END#22L-20505",2/9/2016,Production + Item: 9976,WO # : W00007519 25,1,ea,"456.321.07","TRAY, BOTTOM, 66#47L-20503",2/9/2016,Production +Item: 9976,WO # : W00007546 26,1,ea,"456.321.08","TUBE, MEDIUM/SHORT, 66#14CR-20539",2/9/2016,Prod +uction Item: 9976,WO # : W00007546 27,1,ea,"456.321.09","TOP SPRT, 66#12CR-20504",2/9/2016,Production Ite +m: 9976,WO # : W00007546 28,1,ea,"456.321.12","TRAY, TOP, 66#51R-20507",2/9/2016,Production Ite +m: 9976,WO # : W00007546 29,2,ea,"456.321.11","BRACE, MEDIUM END#22L-20505",2/9/2016,Production + Item: 9976,WO # : W00007546 30,1,ea,"456.321.07","TRAY, BOTTOM, 66#47L-20503",2/9/2016,Production +Item: 9976,WO # : W00007566 31,1,ea,"456.321.08","TUBE, MEDIUM/SHORT, 66#14CR-20539",2/9/2016,Prod +uction Item: 9976,WO # : W00007566 32,1,ea,"456.321.09","TOP SPRT, 66#12CR-20504",2/9/2016,Production Ite +m: 9976,WO # : W00007566 33,1,ea,"456.321.12","TRAY, TOP, 66#51R-20507",2/9/2016,Production Ite +m: 9976,WO # : W00007566 34,2,ea,"456.321.11","BRACE, MEDIUM END#22L-20505",2/9/2016,Production + Item: 9976,WO # : W00007566 35,1,ea,"456.325.10","BOX, 22/29"", WMT UPR#18CR-20518",2/9/2016,Produ +ction Item: 9876,WO # : W00007587 36,2,ea,"456.321.11","BRACE, MEDIUM END#22L-20505",2/9/2016,Production + Item: 9776,WO # : W00007604 37,1,ea,"456.321.29","TRAY, BTM, 94#45L-20549",2/9/2016,Production Ite +m: 9776,WO # : W00007604 38,1,ea,"456.321.30","TUBE,MED/SHORT, 94#05CR-20550",2/9/2016,Producti +on Item: 9776,WO # : W00007604 39,1,ea,"456.321.31","SPRT, TOP, 94#04CR-20501",2/9/2016,Production It +em: 9776,WO # : W00007604 40,1,ea,"456.321.32","TRAY, TOP, 94#43R-20548",2/9/2016,Production Ite +m: 9776,WO # : W00007604 41,1,ea,"456.321.07","TRAY, BOTTOM, 66#47L-20503",2/9/2016,Production +Item: 9776,WO # : W00007641 42,1,ea,"456.321.08","TUBE, MEDIUM/SHORT, 66#14CR-20539",2/9/2016,Prod +uction Item: 9776,WO # : W00007641 43,1,ea,"456.321.09","TOP SPRT, 66#12CR-20504",2/9/2016,Production Ite +m: 9776,WO # : W00007641 44,2,ea,"456.321.11","BRACE, MEDIUM END#22L-20505",2/9/2016,Production + Item: 9776,WO # : W00007641 45,1,ea,"456.321.12","TRAY, TOP, 66#51R-20507",2/9/2016,Production Ite +m: 9776,WO # : W00007641

I want to accumulate the values of textbox95 for each value of textbox85.
This would give a value of 6 to textbox95 when textbox85 eq "456.321.07" for record 1
It should also eliminate all other records that contain "456.321.07" in textbox85

Currently, I am reading this in line by line.
Modifying the data fields as needed.
I write to a new file the format that can be used at the next step.
This strips the last two data fields, which help make the lines unique.
However there is a sequence number in the first field that will be "skipping" numbers after records are removed.

Ok, I modified my code to remove all of my processing. I am pretty new at perl, so I don't get the _DATA_ thing that GrandFather was mentioning below. (no I haven't taken the time to read the link either). When I run the code against the data above it produces errors. This was developed from the cryptic code NetWallah posted. It is possible and likely the errors are produced by my hack and slash of the original code to make it fit better in this forum.
**Please Note** This is my second perl script if I am doing something wrong please let me know. Most of my script writing up until this point has been bash and MS-DOS Batch files, with a bit of Pascal and Java just to make it interesting.
Thanks! jobc

Errors

C:\A>perl test.pl test.csv Odd number of elements in anonymous hash at test.pl line 14. Reference found where even-sized list expected at test.pl line 14.

My perl code

#!/usr/bin/perl # test.pl use strict; use warnings; use Text::CSV_XS; use File::BOM; use Switch; my @rows; my @strings; my $CT = 0; my $i = 0; my %i = {$i}; my $input = "000"; # read in argument and use the first one as the filename to be read if (! defined $ARGV[0]){ print "Usage: ./convertACsv.pl \"Please specify the filename like +\"123456.csv\"\"\n"; exit; } my $FileName = $ARGV[0]; # In a hex editor our source file shows \n or 0x0A or \010 characters +in the data fields. # Our next process wants to have all fields within quotes and separate +d by commas. my $aCSV = Text::CSV_XS->new ({ eol => undef, # \r, \n, or \r\n sep_char => ',', sep => undef, quote_char => '"', quote => '"', escape_char => '"', binary => 1, decode_utf8 => 1, auto_diag => 1, diag_verbose => 1, blank_is_undef => 0, empty_is_undef => 0, allow_whitespace => 1, allow_loose_quotes => 1, allow_loose_escapes => 0, allow_unquoted_escape => 0, always_quote => 1, quote_empty => 0, quote_space => 0, escape_null => 1, quote_binary => 1, keep_meta_info => 0, verbatim => 0, types => undef, callbacks => undef, }); open my $ORIG_CSV, '<:via(File::BOM)', "C:\\A\\$FileName" or die "Can +'t open C:\\A\\$FileName: $!"; open my $MOD_CSV, '>:raw:encoding(iso-8859-1)', "C:\\A\\ModCSV.csv" o +r die "Can't open C:\\A\\ModCSV.csv: $!"; open my $KBan_CSV, '>:raw:encoding(iso-8859-1)', "C:\\A\\kban.csv" or + die "Can't open C:\\A\\kban.csv: $!"; open my $CT_CSV, '>:raw:encoding(iso-8859-1)', "C:\\A\\ct.csv" or die + "Can't open C:\\A\\ct.csv: $!"; while (my $row = $aCSV->getline ($ORIG_CSV)) { if ($row->[0] =~ /\S/ ) { #many regex data modifiers removed from here if ($row->[1] =~ /^\d+/) { $i{$row->[3]}+=$row->[1]; print "$i{$_} $_+\n for sort keys %i"; } # post processing removed push @rows, $row; } close $ORIG_CSV; $aCSV->say ($MOD_CSV, $_) for @rows; close $MOD_CSV or die "Can't close C:\\A\\ModCSV.csv: $!"; close $KBan_CSV or die "Can't close C:\\A\\KBan.csv: $!"; close $CT_CSV or die "Can't close C:\\A\\CT.csv: $!"; }

Replies are listed 'Best First'.
Re: Accumulate values for each data field
by NetWallah (Canon) on Feb 09, 2016 at 23:28 UTC
    Something like this ?

    (The code is as clear as your specifications)

    >perl -anF, -e "$x{$F[3]}+=$F[1] if $F[1]=~/^\d+/}{print qq|$x{$_}\t$_ +\n| for sort keys %x" data1.txt 2 "456.123.00" 2 "456.147.00" 6 "456.321.07" 5 "456.321.08" 5 "456.321.09" 12 "456.321.11" 6 "456.321.12" 1 "456.321.15" 1 "456.321.29" 1 "456.321.30" 1 "456.321.31" 1 "456.321.32" 1 "456.325.10" 2 "456.325.11" 2 "456.325.15" 2 "456.325.30" 2 "456.325.31" 8 "460.339.01" 16 "460.339.02" 2 "460.339.03" 4 "460.339.04"

            "Think of how stupid the average person is, and realize half of them are stupider than that." - George Carlin

      Very Nice! Actually, I need to output the entire data line, but that would be a start. I will try to make a better example. There are 400 lines of code that need to be paired down, and I thought I could shortcut the process. Sorry!

        Is this output what you need?
        1,6,ea,456.321.07,"TRAY, BOTTOM, 66#47L-20503",2/9/2016 2,6,ea,456.321.12,"TRAY, TOP, 66#51R-20507",2/9/2016 3,1,ea,456.321.15,"TUBE, TALL, 66#15CR-21862",2/9/2016 4,8,ea,460.339.01,"WRAP, LGE GRP, xx#03CR-28081",2/9/2016 5,16,ea,460.339.02,"TRAY LGE GRP, xx#49R-27709",2/9/2016 8,2,ea,460.339.03,"WRAP, SM GRP, xx#02CR-28079",2/9/2016 9,4,ea,460.339.04,"TRAY, SM GRP, xx#50R-27711",2/9/2016 14,2,ea,456.325.30,"CNT LONG BOX BTM#08CR-23399",2/9/2016 15,2,ea,456.325.31,"CNT LONG BOX TOP#27L-23400",2/9/2016 16,2,ea,456.123.00," KIT ASSY BOX#F29487",2/9/2016 17,2,ea,456.147.00," KIT ASSY BOX 2, SC#F29494",2/9/2016 18,2,ea,456.325.11,"BOX, 34/43"", WMT UPR#11CR-20517",2/9/2016 19,2,ea,456.325.15,"BOX, 34""/42"" WMT UPR, BTM#13CR-21227",2/9/2016 21,5,ea,456.321.08,"TUBE, MEDIUM/SHORT, 66#14CR-20539",2/9/2016 22,5,ea,456.321.09,"TOP SPRT, 66#12CR-20504",2/9/2016 24,12,ea,456.321.11,"BRACE, MEDIUM END#22L-20505",2/9/2016 35,1,ea,456.325.10,"BOX, 22/29"", WMT UPR#18CR-20518",2/9/2016 37,1,ea,456.321.29,"TRAY, BTM, 94#45L-20549",2/9/2016 38,1,ea,456.321.30,"TUBE,MED/SHORT, 94#05CR-20550",2/9/2016 39,1,ea,456.321.31,"SPRT, TOP, 94#04CR-20501",2/9/2016 40,1,ea,456.321.32,"TRAY, TOP, 94#43R-20548",2/9/2016
        This cuts off the last 2 columns. Also for all the other columns, only the cols from the first occurrence are used.

        You said this data was going to be used in another program or processed further in a larger program. Maybe it would be helpful to describe what is needed to be done.

Re: Accumulate values for each data field
by GrandFather (Saint) on Feb 09, 2016 at 22:37 UTC

    So take 1/10th of your sample data, append it to a test script in the __DATA__ section and show us what you are currently doing. Show us (not tell us) what output you get and what output you want.

    Premature optimization is the root of all job security

      Hmmm, I did it this way because much of what I am doing to the data it not pertinent to this problem. I can see what you mean about showing and not telling. I'll see what I can come up with, thanks.

        Read I know what I mean. Why don't you?. We don't want the whole thing, just like we don't want all the data. We want to see just the code you are having trouble with wrapped up in to tiny test framework with just enough data to show the issue.

        Premature optimization is the root of all job security