Re^2: for each unique value in a column find the max value..need perl script

Thanks all for your reply

reformatted the date as below:

order mtime no size id day date

14098703993 154538.354300 200 1 101510

14098703993 154539.420000 200 1 101511

14098703994 154538.398200 487 1 100888

14098703994 154610.720000 487 1 91588

14098703995 154538.401200 200 1 101502

14098703995 154539.420000 200 1 101500


use List::Util qw(max min);

my %id_hash;
open (DATA, ".txt"); 
while (<DATA>) {
    chomp;
    my ($order, $mtime, $size, $id, $date) = split /\t/;
    push @{ $id_hash{$order}{$id}{mtime} }, $mtime;
    push @{ $id_hash{$order}{$id}{size  } }, $size;
    push @{ $id_hash{$order}{$id}{date   } }, $date;
}

open (OUT, ">output.txt"); 
for my $order (keys %id_hash) {
    for my $id (keys %{ $id_hash{$order} }) {
        my $Low     = min( @ { $id_hash{$order}{$id}{mtime} } );
        print OUT "$order $Low \n";
    }
}
[download]

Now the problem is this does not give duplicate order entires! I think I am unable to do the below: I want the duplicate order values as it is. Unable to replace the oldest mtime into the date field.

So now I am doing the below stupid code which will be the loooongest code of my life...:

open (OUT, ">output.txt");
open (IN, "input1.txt");->original file1 with all data 

while($line=<IN>){ chomp($line);
    
    ($Date,$MTime,$inserdate,$inserttime,$Id,$Phase,$Size,$day,$order)
+= split(/ /,$line);    

open (INL, "file2.txt");-->contains the sorted order values of order v
+alues($x) from file 1
while($linel=<INL>){ chomp($linel);
($x,$y,$z,$q,$t)= split(/ /,$linel);
if($x == $order)
{    #print OUT "$a $x $b $z $q $t \n";
    print OUT $Date," ",$M_Time," ",$inserdate," ",$y," ",$Id," ",$Pha
+se," ",$Size," ",$day," ",$order,"\n";
    
}
}
}
close(INL);
close(IN);
close(OUT);
 
 

print "DONE";
[download]

Hope it makes sense to you.. am not an expert in Perl.. just trial and error guys.. but now really need ur help...plzzz!

Comment on Re^2: for each unique value in a column find the max value..need perl script Select or Download Code

Replies are listed 'Best First'.
Re^3: for each unique value in a column find the max value..need perl script by martell (Hermit) on May 31, 2011 at 21:14 UTC
A solution: mind you, I wouldn't use this on a very big file. Not exactly the method davido was describing, but it will do. my %id_hash; my @lines = (); open (DATA, "test.txt"); while ($line = <DATA>) { chomp $line; my @line = split /\t/ , $line; push @lines, \@line; # push the original line in an array as an an +onymous array if ($line[1] > $id_hash{$line[0]}) {$id_hash{$line[0]} = $line[1]; +} # calculate the biggest mtime for a given order id } open (OUT, ">output.txt"); foreach my $item (@lines) { my @line = @{$item}; # get the original line $line[4] = $id_hash{$line[0]}; # replace the fifth element with th +e calculated maximum print OUT join "\t" ,@line, "\n"; # print the adapted line } [download] Greetings Martell	[reply] [d/l]
Re^4: for each unique value in a column find the max value..need perl script by qmenon (Initiate) on Jun 01, 2011 at 08:08 UTC
Thanks Martell, You are right, it does look like it will take time to run it on big data files which is my case; I will give it a go anyways .	[reply]
Re^3: for each unique value in a column find the max value..need perl script by ww (Archbishop) on Jun 01, 2011 at 01:23 UTC
That's a bit hard to follow. I count seven captions (headers), `order mtime no size id day date`, but even hypothesizing that the dot in the second field, `14098703993 154538.354300 200 1 101510`, denotes a field break, I can only find six fields. 7==6 does not compute.	[reply] [d/l] [select]
Re^4: for each unique value in a column find the max value..need perl script by qmenon (Initiate) on Jun 01, 2011 at 08:24 UTC
Ah Now I get it, I just removed 1 field because I dint need it in the output file. Sorry for the confusion.	[reply]
Re^4: for each unique value in a column find the max value..need perl script by qmenon (Initiate) on Jun 01, 2011 at 08:14 UTC
Hi ww If I understand ur question correctly I guess u have missed the last field in my input file? 20100607 154538.354300 200 1 101510 14098703993 Thanks	[reply]