Ankur_kuls has asked for the wisdom of the Perl Monks concerning the following question:

Hi friends. Below is the sample of my csv file

Amount ChoiceId Allowed PID AmountId MultiplyingFactor + PF new ServiceTax ExtraTalkTime MRP Recharge Type FRC +/SRC 13 na 285 100 1.1236 81.75 0.02 0 92 Freebi +e(BA)+PID Normal 59 na 285 101 1.1236 8.31 0.07 0 10 Freebie +(BA)+PID Normal 13 na 289 100 1.1236 81.75 0.05 0 92 Freebi +e(BA)+PID Normal 2001 na 285 103 1.103 27.13 2.06 0 52 MA+Da +taPack+PID Normal 1572 na 285 104 na 0 na na na Blank MRP, re +st OK Normal 1574 na 285 105 na 0 na na na Blank MRP, re +st OK Normal 333300 na 285 106 1 0 0 0 3333 Freebie(BA)+ +MA+PID Normal 93 na 285 107 1.1236 100.53 0.11 0 114 Free +bie(BA)+PID+SA Normal 78 na 285 108 1.1236 8.12 0.1 0 10 Freebie( +BA)+PID Normal 79 na 285 109 1.1236 85.54 0.1 0 97 BA20 on +ly Normal 13 na 290 100 1.1236 81.75 0.11 0 92 Freebi +e(BA)+PID Normal

now as per the requirement, if values of amount, amountId and MRP are same for particular rows then in that case i need to replace it by only one row in which value of AllowdPID and Service tax will take values from all the input the rows, separated by semi colon..(rest of the values we will not manipulate as they will be equal automatically)... for example as the amount, amountId and MRP values for 1st, 3rd and last rows are equal. our output will contain only one line in place of these three line like below.

13 na 285;289;290 100 1.1236 81.75 0.02;0.05;0.11 + 0 92 Freebie(BA)+PID Normal

how can I achieve it using perl. I tried someting using Text::CSV but can't share as it becomes a total mess :(...please help.

below is my script

#! /usr/bin/perl use strict; use warnings; use Text::CSV; my $basepath = "/home/eankuls/perl/vodafone/punjab"; my $IPfile = "Consolidated_output.csv"; my $OPfile = "Final_report.csv"; open (my $FH, "${basepath}/${IPfile}") or die "Can't open $IPfile file +: $!"; open (my $WFH, ">>", "${basepath}/${OPfile}") or die "Can't open $OPfi +le file: $!"; my $csv = Text::CSV->new({ sep_char => ",", binary => 1, auto_diag => 1 }); my $row = $csv->getline($FH); my $flag = 0; my $flag1 = 0; my (@row1, @row2); while (my $row = $csv->getline($FH) ) { my (@AllowedPID, @STax); if ($flag == 0) { @row1 = @{$row}; $flag++; push (@AllowedPID, $row1[2]); push (@STax, $row1[6]); next; } else { @row2 = @{$row}; } if ($row1[0] == $row2[0] and $row1[3] == $row2[3] and $row1[8] +== $row2[8]) { if ($flag1 == 0 ) { push (@AllowedPID, $row1[2]); push (@STax, $row1[6]); $flag1++; } push (@AllowedPID, $row2[2]); push (@STax, $row2[6]); next; } else { $row2[2] = join(";", @AllowedPID); $row2[6] = join(";", @STax); my $row_final = join (",", @row2); print $WFH $row_final; @row1 = @row2; next; } #print WFH @row2; $flag = 0; $flag1 = 0; }

Replies are listed 'Best First'.
Re: How to compare and manipulate csv file?
by hippo (Archbishop) on Oct 01, 2014 at 13:03 UTC
Re: perl: How to comapre and manipulate csv file in perl?
by Corion (Patriarch) on Oct 01, 2014 at 12:00 UTC

    How do you expect us to help you with your program if you don't show it?

    Text::CSV_XS has many highly convenient functions for reading and writing CSV files.

Re: perl: How to comapre and manipulate csv file in perl?
by Eily (Monsignor) on Oct 01, 2014 at 13:14 UTC

    Your script is hard to read indeed. Not just because you got confused on how to do the work, but also because of bad names. First, if your variables are numbered (like @row1, @row2), there's a high chance that it would be a good idea to put them in an array, or change their name. In your case, @previousRow and @currentRow would have been far more explicit. I have no idea what $flag and $flag1 are used for, but I'm sure they could have better names. And BTW, you should avoid $flag++ if the only values $flag can take are 0 or 1. Write $flag = 1; instead.

    Now, for your issue, there is an association that is often useful in perl: unique means hash. If you want only one record for each unique amount, amountId and MRP combinaison, put them in a hash as keys, and the rest of the values in the associated value.

    # I didn't copy the top of your file, and you should add use Data::Dum +per; if you want this code to work my $header = $csv->getline($FH); my @keys = @$header; my %result; while (my $row = $csv->getline($FH)) { my %currentRow; @currentRow{@keys} = @$row; print "Current row: ", Dumper \%currentRow; my $uniqueKey = "$currentRow{Amount} & $currentRow{AmountId} & $curr +entRow{MRP}"; # Bad idea if '&' is allowed in either of those values if(!exists $result{$uniqueKey}) # If this combinaison is unknown { $result{$uniqueKey} = [\%currentRow]; # save the current row to it } else { push @{ $result{$uniqueKey} }, \%currentRow; # add the current row + to the ones with the same ID } } print "Result: ", Dumper \%result;
    perldata and perldsc could be a good read.

    Edit: or the short version following AnomalousMonk's post:

    while (my $row = $csv->getline($FH)) { my %currentRow; @currentRow{@keys} = @$row; print "Current row: ", Dumper \%currentRow; my $uniqueKey = "$currentRow{Amount} & $currentRow{AmountId} & $curr +entRow{MRP}"; # Bad idea if '&' is allowed in either of those values push @{ $result{$uniqueKey} }, \%currentRow; # add the current row t +o the existing ones } print "Result: ", Dumper \%result;

      if(!exists $result{$uniqueKey}) # If this combinaison is unknown { $result{$uniqueKey} = [\%currentRow]; # save the current row to i +t } else { push @{ $result{$uniqueKey} }, \%currentRow; # add the current ro +w to the existing ones }

      The exists test is not needed because autovivification (see perlglossary) will create an empty anonymous array reference to push a new value into in the case of a key that does not already exist:

      c:\@Work\Perl\monks>perl -wMstrict -le "use Data::Dump; ;; my @ra = qw(foo a bar w bar x foo b bar y); dd \@ra; ;; my %hash; while (@ra >= 2) { my $k = shift @ra; my $v = shift @ra; ;; push @{ $hash{$k} }, $v } dd \%hash; ;; @ra = qw(foo A bar W bar X foo B bar Y); dd \@ra; ;; my $hashref; while (@ra >= 2) { my ($k, $v) = splice @ra, 0, 2; push @{ $hashref->{$k} }, $v; } dd $hashref; " ["foo", "a", "bar", "w", "bar", "x", "foo", "b", "bar", "y"] { bar => ["w", "x", "y"], foo => ["a", "b"] } ["foo", "A", "bar", "W", "bar", "X", "foo", "B", "bar", "Y"] { bar => ["W", "X", "Y"], foo => ["A", "B"] }

        Oh, right! I keep forgetting that autovivification isn't just when you do something like $name{KEY}[0][1].

Re: perl: How to comapre and manipulate csv file in perl?
by misterperl (Friar) on Oct 01, 2014 at 14:53 UTC
    I have many comments on your code , but Im not really sure what you're trying to accomplish so I'l keep it short.

    It seems like you're trying to operate on 2-row blocks? Like if the first row meets some match criteria, then meld it into the next row?

    ...then maybe break up the file into 2-row blocks and use regexes on each one.. Like split it this way:

    my @tworows = split /[^\n]+\n[^[n]+\n/, $text;

    then you can just use something like

    s/((\w+).+\1.+\1.+)\n(.+)/something/

    then repeat that starting on the first even row (because the regex always checks the FIRST row for matches). Much simpler than storing all those temporary arrays, whiles, ifs, etc.

    But here are few recomendations:

    1. next; } is the same as } 2. Don't make a copy of your row array to use it, just dereference it: my @row1 = @{$row}; # not needed.. Just use $row->[$index] instead of $row1[$index] 3. drop all the extra parens join ';', @arr; push @arr, 'something'; 4. it looks like you're using a numeric == to compare ascii fields lik +e $row1[0] == $row2[0] when col 0 contains things like 'PF'. Use eq f +or these compares. You can't use == there.
Re: perl: How to comapre and manipulate csv file in perl?
by kevbot (Vicar) on Oct 02, 2014 at 06:45 UTC
    The melt and cast functions of the Data::Table module are useful for these kinds of data aggregation tasks. Assuming that your data.csv file is in a valid CSV format such as,
    Amount,ChoiceId,Allowed PID,AmountId,MultiplyingFactor,PF new,ServiceT +ax,ExtraTalkTime,MRP,Recharge Type,FRC/SRC 13,na,285,100,1.1236,81.75,0.02,0,92,Freebie(BA)+PID,Normal 59,na,285,101,1.1236,8.31,0.07,0,10,Freebie(BA)+PID,Normal 13,na,289,100,1.1236,81.75,0.05,0,92,Freebie(BA)+PID,Normal 2001,na,285,103,1.103,27.13,2.06,0,52,MA+DataPack+PID,Normal 1572,na,285,104,na,0,na,na,na,"Blank MRP,rest OK",Normal 1574,na,285,105,na,0,na,na,na,"Blank MRP,rest OK",Normal 333300,na,285,106,1,0,0,0,3333,Freebie(BA)+MA+PID ,Normal 93,na,285,107,1.1236,100.53,0.11,0,114,Freebie(BA)+PID+SA,Normal 78,na,285,108,1.1236,8.12,0.1,0,10,Freebie(BA)+PID,Normal 79,na,285,109,1.1236,85.54,0.1,0,97,BA20 only,Normal 13,na,290,100,1.1236,81.75,0.11,0,92,Freebie(BA)+PID,Normal
    You can get the output you desire like this,
    #!/usr/bin/env perl use strict; use warnings; use Data::Table; my $dt = Data::Table::fromCSV("data.csv"); # print "DATA:\n"; # print $dt->csv; # All of the columns except the ones you want to aggregate my $cols = [ 'Amount', 'AmountId', 'MRP', 'ChoiceId', 'MultiplyingFactor', 'PF new', 'ExtraTalkTime', 'Recharge Type', 'FRC/SRC' ]; my $mdt = $dt->melt( $cols ); # print "MELT:\n"; # print $mdt->csv; my $cdt = $mdt->cast( $cols, 'variable', Data::Table::STRING, 'value', \&aggregate ); #print "CAST:\n"; #print $cdt->csv; exit; sub aggregate { my @values = @_; my $av = join(';', @values); return($av); }
    One can uncomment the print statements to see the intermediate tables. The final result in $cdt is:
    Amount,AmountId,MRP,ChoiceId,MultiplyingFactor,PF new,ExtraTalkTime,Re +charge Type,FRC/SRC,Allowed PID,ServiceTax 13,100,92,na,1.1236,81.75,0,Freebie(BA)+PID,Normal,285;289;290,0.02;0. +05;0.11 59,101,10,na,1.1236,8.31,0,Freebie(BA)+PID,Normal,285,0.07 2001,103,52,na,1.103,27.13,0,MA+DataPack+PID,Normal,285,2.06 1572,104,na,na,na,0,na,"Blank MRP,rest OK",Normal,285,na 1574,105,na,na,na,0,na,"Blank MRP,rest OK",Normal,285,na 333300,106,3333,na,1,0,0,Freebie(BA)+MA+PID ,Normal,285,0 93,107,114,na,1.1236,100.53,0,Freebie(BA)+PID+SA,Normal,285,0.11 78,108,10,na,1.1236,8.12,0,Freebie(BA)+PID,Normal,285,0.1 79,109,97,na,1.1236,85.54,0,BA20 only,Normal,285,0.1