perl: How to comapre and manipulate csv file in perl?

Ankur_kuls has asked for the wisdom of the Perl Monks concerning the following question:

Hi friends. Below is the sample of my csv file

 Amount    ChoiceId    Allowed PID    AmountId    MultiplyingFactor   
+ PF new    ServiceTax    ExtraTalkTime    MRP    Recharge Type    FRC
+/SRC
13    na    285    100    1.1236    81.75    0.02    0    92    Freebi
+e(BA)+PID    Normal
59    na    285    101    1.1236    8.31    0.07    0    10    Freebie
+(BA)+PID    Normal
13    na    289    100    1.1236    81.75    0.05    0    92    Freebi
+e(BA)+PID    Normal
2001    na    285    103    1.103    27.13    2.06    0    52    MA+Da
+taPack+PID    Normal
1572    na    285    104    na    0    na    na    na    Blank MRP, re
+st OK    Normal
1574    na    285    105    na    0    na    na    na    Blank MRP, re
+st OK    Normal
333300    na    285    106    1    0    0    0    3333    Freebie(BA)+
+MA+PID     Normal
93    na    285    107    1.1236    100.53    0.11    0    114    Free
+bie(BA)+PID+SA    Normal
78    na    285    108    1.1236    8.12    0.1    0    10    Freebie(
+BA)+PID    Normal
79    na    285    109    1.1236    85.54    0.1    0    97    BA20 on
+ly    Normal
13    na    290    100    1.1236    81.75    0.11    0    92    Freebi
+e(BA)+PID    Normal
[download]

now as per the requirement, if values of amount, amountId and MRP are same for particular rows then in that case i need to replace it by only one row in which value of AllowdPID and Service tax will take values from all the input the rows, separated by semi colon..(rest of the values we will not manipulate as they will be equal automatically)... for example as the amount, amountId and MRP values for 1st, 3rd and last rows are equal. our output will contain only one line in place of these three line like below.

13    na    285;289;290    100    1.1236    81.75    0.02;0.05;0.11   
+ 0    92    Freebie(BA)+PID    Normal
[download]

how can I achieve it using perl. I tried someting using Text::CSV but can't share as it becomes a total mess :(...please help.

below is my script

#! /usr/bin/perl

use strict;
use warnings;
use Text::CSV;

my $basepath = "/home/eankuls/perl/vodafone/punjab";
my $IPfile = "Consolidated_output.csv";
my $OPfile = "Final_report.csv";

open (my $FH, "${basepath}/${IPfile}") or die "Can't open $IPfile file
+: $!";
open (my $WFH, ">>", "${basepath}/${OPfile}") or die "Can't open $OPfi
+le file: $!";

my $csv = Text::CSV->new({
                sep_char => ",",
                binary => 1,
                auto_diag => 1 });

my $row = $csv->getline($FH);

my $flag = 0;
my $flag1 = 0;
my (@row1, @row2);

while (my $row = $csv->getline($FH) ) {
       my (@AllowedPID, @STax);
       if ($flag == 0) {
           @row1 = @{$row};
           $flag++;
           push (@AllowedPID, $row1[2]); 
           push (@STax, $row1[6]);
           next;
           }
       else {
            @row2 = @{$row};     
            }

       if ($row1[0] == $row2[0] and $row1[3] == $row2[3] and $row1[8] 
+== $row2[8])
              {
                if ($flag1 == 0 ) {
              push (@AllowedPID, $row1[2]);
              push (@STax, $row1[6]); 
              $flag1++;
        }

              push (@AllowedPID, $row2[2]);
          push (@STax, $row2[6]);
              next; }


       else {
             $row2[2] = join(";", @AllowedPID);
             $row2[6] = join(";", @STax);
             my $row_final = join (",", @row2);
             print $WFH $row_final;
             @row1 = @row2;
             next;
             }

             #print WFH @row2;
             $flag = 0;
             $flag1 = 0;
        }
[download]

Comment on perl: How to comapre and manipulate csv file in perl? Select or Download Code

Replies are listed 'Best First'.
Re: How to compare and manipulate csv file? by hippo (Archbishop) on Oct 01, 2014 at 13:03 UTC
Cross-posted at StackOverflow You wrote: `my $csv = Text::CSV->new({ sep_char => ",", binary => 1, auto_diag => 1 });` [download] but your data has no commas in it. How do you expect that to work?	[reply] [d/l]
Re: perl: How to comapre and manipulate csv file in perl? by Corion (Patriarch) on Oct 01, 2014 at 12:00 UTC
How do you expect us to help you with your program if you don't show it? Text::CSV_XS has many highly convenient functions for reading and writing CSV files.	[reply]
Re: perl: How to comapre and manipulate csv file in perl? by Eily (Monsignor) on Oct 01, 2014 at 13:14 UTC
Your script is hard to read indeed. Not just because you got confused on how to do the work, but also because of bad names. First, if your variables are numbered (like @row1, @row2), there's a high chance that it would be a good idea to put them in an array, or change their name. In your case, @previousRow and @currentRow would have been far more explicit. I have no idea what $flag and $flag1 are used for, but I'm sure they could have better names. And BTW, you should avoid `$flag++` if the only values $flag can take are 0 or 1. Write `$flag = 1;` instead. Now, for your issue, there is an association that is often useful in perl: unique means hash. If you want only one record for each unique amount, amountId and MRP combinaison, put them in a hash as keys, and the rest of the values in the associated value. # I didn't copy the top of your file, and you should add use Data::Dum +per; if you want this code to work my $header = $csv->getline($FH); my @keys = @$header; my %result; while (my $row = $csv->getline($FH)) { my %currentRow; @currentRow{@keys} = @$row; print "Current row: ", Dumper \%currentRow; my $uniqueKey = "$currentRow{Amount} & $currentRow{AmountId} & $curr +entRow{MRP}"; # Bad idea if '&' is allowed in either of those values if(!exists $result{$uniqueKey}) # If this combinaison is unknown { $result{$uniqueKey} = [\%currentRow]; # save the current row to it } else { push @{ $result{$uniqueKey} }, \%currentRow; # add the current row + to the ones with the same ID } } print "Result: ", Dumper \%result; [download] perldata and perldsc could be a good read. Edit: or the short version following AnomalousMonk's post: `while (my $row = $csv->getline($FH)) { my %currentRow; @currentRow{@keys} = @$row; print "Current row: ", Dumper \%currentRow; my $uniqueKey = "$currentRow{Amount} & $currentRow{AmountId} & $curr +entRow{MRP}"; # Bad idea if '&' is allowed in either of those values push @{ $result{$uniqueKey} }, \%currentRow; # add the current row t +o the existing ones } print "Result: ", Dumper \%result;` [download]	[reply] [d/l] [select]
Re^2: perl: How to comapre and manipulate csv file in perl? by AnomalousMonk (Archbishop) on Oct 01, 2014 at 16:34 UTC
`if(!exists $result{$uniqueKey}) # If this combinaison is unknown { $result{$uniqueKey} = [\%currentRow]; # save the current row to i +t } else { push @{ $result{$uniqueKey} }, \%currentRow; # add the current ro +w to the existing ones }` [download] The exists test is not needed because autovivification (see perlglossary) will create an empty anonymous array reference to push a new value into in the case of a key that does not already exist: c:\@Work\Perl\monks>perl -wMstrict -le "use Data::Dump; ;; my @ra = qw(foo a bar w bar x foo b bar y); dd \@ra; ;; my %hash; while (@ra >= 2) { my $k = shift @ra; my $v = shift @ra; ;; push @{ $hash{$k} }, $v } dd \%hash; ;; @ra = qw(foo A bar W bar X foo B bar Y); dd \@ra; ;; my $hashref; while (@ra >= 2) { my ($k, $v) = splice @ra, 0, 2; push @{ $hashref->{$k} }, $v; } dd $hashref; " ["foo", "a", "bar", "w", "bar", "x", "foo", "b", "bar", "y"] { bar => ["w", "x", "y"], foo => ["a", "b"] } ["foo", "A", "bar", "W", "bar", "X", "foo", "B", "bar", "Y"] { bar => ["W", "X", "Y"], foo => ["A", "B"] } [download]	[reply] [d/l] [select]
Re^3: perl: How to comapre and manipulate csv file in perl? by Eily (Monsignor) on Oct 01, 2014 at 16:45 UTC
Oh, right! I keep forgetting that autovivification isn't just when you do something like `$name{KEY}[0][1]`.	[reply] [d/l]
Re: perl: How to comapre and manipulate csv file in perl? by misterperl (Friar) on Oct 01, 2014 at 14:53 UTC
I have many comments on your code , but Im not really sure what you're trying to accomplish so I'l keep it short. It seems like you're trying to operate on 2-row blocks? Like if the first row meets some match criteria, then meld it into the next row? ...then maybe break up the file into 2-row blocks and use regexes on each one.. Like split it this way: `my @tworows = split /[^\n]+\n[^[n]+\n/, $text;` [download] then you can just use something like `s/((\w+).+\1.+\1.+)\n(.+)/something/` [download] then repeat that starting on the first even row (because the regex always checks the FIRST row for matches). Much simpler than storing all those temporary arrays, whiles, ifs, etc. But here are few recomendations: `1. next; } is the same as } 2. Don't make a copy of your row array to use it, just dereference it: my @row1 = @{$row}; # not needed.. Just use $row->[$index] instead of $row1[$index] 3. drop all the extra parens join ';', @arr; push @arr, 'something'; 4. it looks like you're using a numeric == to compare ascii fields lik +e $row1[0] == $row2[0] when col 0 contains things like 'PF'. Use eq f +or these compares. You can't use == there.` [download]	[reply] [d/l] [select]
Re: perl: How to comapre and manipulate csv file in perl? by kevbot (Vicar) on Oct 02, 2014 at 06:45 UTC
The melt and cast functions of the Data::Table module are useful for these kinds of data aggregation tasks. Assuming that your data.csv file is in a valid CSV format such as, Amount,ChoiceId,Allowed PID,AmountId,MultiplyingFactor,PF new,ServiceT +ax,ExtraTalkTime,MRP,Recharge Type,FRC/SRC 13,na,285,100,1.1236,81.75,0.02,0,92,Freebie(BA)+PID,Normal 59,na,285,101,1.1236,8.31,0.07,0,10,Freebie(BA)+PID,Normal 13,na,289,100,1.1236,81.75,0.05,0,92,Freebie(BA)+PID,Normal 2001,na,285,103,1.103,27.13,2.06,0,52,MA+DataPack+PID,Normal 1572,na,285,104,na,0,na,na,na,"Blank MRP,rest OK",Normal 1574,na,285,105,na,0,na,na,na,"Blank MRP,rest OK",Normal 333300,na,285,106,1,0,0,0,3333,Freebie(BA)+MA+PID ,Normal 93,na,285,107,1.1236,100.53,0.11,0,114,Freebie(BA)+PID+SA,Normal 78,na,285,108,1.1236,8.12,0.1,0,10,Freebie(BA)+PID,Normal 79,na,285,109,1.1236,85.54,0.1,0,97,BA20 only,Normal 13,na,290,100,1.1236,81.75,0.11,0,92,Freebie(BA)+PID,Normal [download] You can get the output you desire like this, #!/usr/bin/env perl use strict; use warnings; use Data::Table; my $dt = Data::Table::fromCSV("data.csv"); # print "DATA:\n"; # print $dt->csv; # All of the columns except the ones you want to aggregate my $cols = [ 'Amount', 'AmountId', 'MRP', 'ChoiceId', 'MultiplyingFactor', 'PF new', 'ExtraTalkTime', 'Recharge Type', 'FRC/SRC' ]; my $mdt = $dt->melt( $cols ); # print "MELT:\n"; # print $mdt->csv; my $cdt = $mdt->cast( $cols, 'variable', Data::Table::STRING, 'value', \&aggregate ); #print "CAST:\n"; #print $cdt->csv; exit; sub aggregate { my @values = @_; my $av = join(';', @values); return($av); } [download] One can uncomment the print statements to see the intermediate tables. The final result in $cdt is: Amount,AmountId,MRP,ChoiceId,MultiplyingFactor,PF new,ExtraTalkTime,Re +charge Type,FRC/SRC,Allowed PID,ServiceTax 13,100,92,na,1.1236,81.75,0,Freebie(BA)+PID,Normal,285;289;290,0.02;0. +05;0.11 59,101,10,na,1.1236,8.31,0,Freebie(BA)+PID,Normal,285,0.07 2001,103,52,na,1.103,27.13,0,MA+DataPack+PID,Normal,285,2.06 1572,104,na,na,na,0,na,"Blank MRP,rest OK",Normal,285,na 1574,105,na,na,na,0,na,"Blank MRP,rest OK",Normal,285,na 333300,106,3333,na,1,0,0,Freebie(BA)+MA+PID ,Normal,285,0 93,107,114,na,1.1236,100.53,0,Freebie(BA)+PID+SA,Normal,285,0.11 78,108,10,na,1.1236,8.12,0,Freebie(BA)+PID,Normal,285,0.1 79,109,97,na,1.1236,85.54,0,BA20 only,Normal,285,0.1 [download]	[reply] [d/l] [select]