in reply to Re: for each unique value in a column find the max value..need perl script
in thread for each unique value in a column find the max value..need perl script

Thanks all for your reply

reformatted the date as below:

order mtime no size id day date

14098703993 154538.354300 200 1 101510

14098703993 154539.420000 200 1 101511

14098703994 154538.398200 487 1 100888

14098703994 154610.720000 487 1 91588

14098703995 154538.401200 200 1 101502

14098703995 154539.420000 200 1 101500

use List::Util qw(max min); my %id_hash; open (DATA, ".txt"); while (<DATA>) { chomp; my ($order, $mtime, $size, $id, $date) = split /\t/; push @{ $id_hash{$order}{$id}{mtime} }, $mtime; push @{ $id_hash{$order}{$id}{size } }, $size; push @{ $id_hash{$order}{$id}{date } }, $date; } open (OUT, ">output.txt"); for my $order (keys %id_hash) { for my $id (keys %{ $id_hash{$order} }) { my $Low = min( @ { $id_hash{$order}{$id}{mtime} } ); print OUT "$order $Low \n"; } }

Now the problem is this does not give duplicate order entires! I think I am unable to do the below: I want the duplicate order values as it is. Unable to replace the oldest mtime into the date field.

So now I am doing the below stupid code which will be the loooongest code of my life...:

open (OUT, ">output.txt"); open (IN, "input1.txt");->original file1 with all data while($line=<IN>){ chomp($line); ($Date,$MTime,$inserdate,$inserttime,$Id,$Phase,$Size,$day,$order) += split(/ /,$line); open (INL, "file2.txt");-->contains the sorted order values of order v +alues($x) from file 1 while($linel=<INL>){ chomp($linel); ($x,$y,$z,$q,$t)= split(/ /,$linel); if($x == $order) { #print OUT "$a $x $b $z $q $t \n"; print OUT $Date," ",$M_Time," ",$inserdate," ",$y," ",$Id," ",$Pha +se," ",$Size," ",$day," ",$order,"\n"; } } } close(INL); close(IN); close(OUT); print "DONE";
Hope it makes sense to you.. am not an expert in Perl.. just trial and error guys.. but now really need ur help...plzzz!

Replies are listed 'Best First'.
Re^3: for each unique value in a column find the max value..need perl script
by martell (Hermit) on May 31, 2011 at 21:14 UTC

    A solution: mind you, I wouldn't use this on a very big file. Not exactly the method davido was describing, but it will do.

    my %id_hash; my @lines = (); open (DATA, "test.txt"); while ($line = <DATA>) { chomp $line; my @line = split /\t/ , $line; push @lines, \@line; # push the original line in an array as an an +onymous array if ($line[1] > $id_hash{$line[0]}) {$id_hash{$line[0]} = $line[1]; +} # calculate the biggest mtime for a given order id } open (OUT, ">output.txt"); foreach my $item (@lines) { my @line = @{$item}; # get the original line $line[4] = $id_hash{$line[0]}; # replace the fifth element with th +e calculated maximum print OUT join "\t" ,@line, "\n"; # print the adapted line }

    Greetings

    Martell

      Thanks Martell, You are right, it does look like it will take time to run it on big data files which is my case; I will give it a go anyways

      .
Re^3: for each unique value in a column find the max value..need perl script
by ww (Archbishop) on Jun 01, 2011 at 01:23 UTC

    That's a bit hard to follow. I count seven captions (headers),
    order mtime no size id day date,
    but even hypothesizing that the dot in the second field,
    14098703993 154538.354300 200 1 101510,
    denotes a field break, I can only find six fields.

    7==6 does not compute.

      Ah Now I get it, I just removed 1 field because I dint need it in the output file. Sorry for the confusion.

      Hi ww

      If I understand ur question correctly I guess u have missed the last field in my input file?

      20100607 154538.354300 200 1 101510 14098703993

      Thanks