john.tm has asked for the wisdom of the Perl Monks concerning the following question:

Hi i have a large csv file with 5 columns, i have managed to remove duplicates based on 2 columns and timestamp most recent. but i am how do i print the updated list with al 5 columns.

i am getting Global symbol "$col3" requires explicit package name warning Global symbol "$col4" requires explicit package name warning

#!/usr/bin/perl use strict; use warnings; use POSIX 'strftime'; my @now = localtime(); my $todaysday = strftime("%d", localtime()); my $mth = strftime("%m" , localtime()); my $secs = strftime("%S" , localtime()); my $mins = strftime("%M" , localtime()); my $hr = strftime("%H" , localtime()); my $year = strftime("%Y" , localtime()); my $dtime = "$year-$mth-$todaysday $hr:$mins"; my %most_recent; my $header = <DATA>; while ( my $line = <DATA> ) { chomp $line; my ($col1,$date_and_time,$col2,$col3,$col4) = split( /,/, $line ); $date_and_time =~ s/^\s+$//g; my $dtime = $date_and_time; if ( not defined $most_recent{$col1}{$col2} or $most_recent{$col1}{$col2} lt $dtime ) { $most_recent{$col1}{$col2} = $dtime; } } print "Most recent:\n"; foreach my $col1 ( keys %most_recent ) { foreach my $col2 ( keys %{$most_recent{$col1}} ) { print "$col1, $col2, $most_recent{$col1}{$col2}, \n"; #print "$col2,$col1,$col3,$col4,\n"; } } __DATA__ LONDO,2015-01-02 11:35,GE04_TDP,ted,fu LONDO,2015-01-02 13:15,GE03_TDP,ted,fu LONDO,2015-01-02 15:42,GE03_TDP,ted,fu LONDO,2015-01-02 15:22,GE04_TDP,ome,ful LONDO,2015-01-02 17:15,GE03_TDP,omp,ful LONDO,2015-01-02 17:32,GE04_TDP,omp,ful LONDO,2015-01-02 20:44,CW02,et,ful LONDO,2015-01-02 19:26,CW03,et,ful LONDO,2015-01-02 20:25,CW01,let,pped LONDO,2015-01-02 19:57,CW04,let,pped LONDO,2015-01-02 19:24,EXCHP,let,ucc LONDO,2015-01-02 19:25,EXCHP,let,ucc LONDO,2015-01-02 19:43,GE03,let,ucc LONDO,2015-01-02 20:41,GE04,Co,ucc LONDO,2015-01-02 21:33,GE03_TDP,Co,ucc LONDO,2015-01-02 21:17,EXCHP,Co,ucc LONDO,2015-01-02 23:24,EXCHDP,Co,ucc LONDO,2015-01-02 23:27,EXCHDP,Co,ucc LONDO,2015-01-03 01:20,EXCHDP,il,02 LONDO,2015-01-03 01:11,EXCHDP,ro
  • Comment on perl to Remove duplicates lines from a csv file based on timestamp most recent
  • Download Code

Replies are listed 'Best First'.
Re: perl to Remove duplicates lines from a csv file based on timestamp most recent
by soonix (Chancellor) on Jan 05, 2015 at 10:13 UTC
    Just to add to pme's solution: You are declaring my ($col1,$date_and_time,$col2,$col3,$col4) within the while loop, so they go out of scope after each iteration.
    However, even if you move the declaration outside the loop, you still would have single values, so you'd end up printing the values of the very last input line.
Re: perl to Remove duplicates lines from a csv file based on timestamp most recent
by pme (Monsignor) on Jan 05, 2015 at 07:47 UTC
    Hi john.tm

    You can add hashref to %most_recent instead of a scalar ($dtime).

    ... if ( not defined $most_recent{$col1}{$col2} or $most_recent{$co +l1}{$col2}->{dtime} lt $dtime ) { $most_recent{$col1}{$col2}->{dtime} = $dtime; $most_recent{$col1}{$col2}->{col3} = $col3; $most_recent{$col1}{$col2}->{col4} = $col4; } } print "Most recent:\n"; foreach my $col1 ( keys %most_recent ) { foreach my $col2 ( keys %{$most_recent{$col1}} ) { print "$col1, $col2, $most_recent{$col1}{$col2}->{dtim +e}, $most_recent{$col1}{$col2}->{col3}, $most_recent{$col1}{$col2}->{ +col4}\n"; } }