Re: for each unique value in a column find the max value..need perl script

Use List::Utils. Row by row, split each field by space. Use the 'order' column as a hash key, and the rest of the columns get put into an anonymous array and pushed onto your hash for that key (use a HoAoA so that you may have multiple entries per key).

Next, iterate over the keys. For each hash entry, pull a list of mtimes out of the AoA portion of the datastructure. get a max() of those values. Then replace the mtime column in the AoA with the value that max() returned.

Now move your original file (rename) to filename.bak (for example). Then open a new file for output with the original file's name, and write your structure back out again in the intended format.

This solution does hold the entire file in memory, so it wouldn't scale well to huge files. But if you were dealing with truly huge data sets you would already have a database, and updates would be as simple as an SQL statement.

If you have a question on part of the implementation, be specific as to which part eludes you, and we'll try to help.

Dave

Comment on Re: for each unique value in a column find the max value..need perl script

Replies are listed 'Best First'.
Re^2: for each unique value in a column find the max value..need perl script by qmenon (Initiate) on May 31, 2011 at 17:20 UTC
Thanks all for your reply reformatted the date as below: order mtime no size id day date 14098703993 154538.354300 200 1 101510 14098703993 154539.420000 200 1 101511 14098703994 154538.398200 487 1 100888 14098703994 154610.720000 487 1 91588 14098703995 154538.401200 200 1 101502 14098703995 154539.420000 200 1 101500 `use List::Util qw(max min); my %id_hash; open (DATA, ".txt"); while (<DATA>) { chomp; my ($order, $mtime, $size, $id, $date) = split /\t/; push @{ $id_hash{$order}{$id}{mtime} }, $mtime; push @{ $id_hash{$order}{$id}{size } }, $size; push @{ $id_hash{$order}{$id}{date } }, $date; } open (OUT, ">output.txt"); for my $order (keys %id_hash) { for my $id (keys %{ $id_hash{$order} }) { my $Low = min( @ { $id_hash{$order}{$id}{mtime} } ); print OUT "$order $Low \n"; } }` [download] Now the problem is this does not give duplicate order entires! I think I am unable to do the below: I want the duplicate order values as it is. Unable to replace the oldest mtime into the date field. So now I am doing the below stupid code which will be the loooongest code of my life...: open (OUT, ">output.txt"); open (IN, "input1.txt");->original file1 with all data while($line=<IN>){ chomp($line); ($Date,$MTime,$inserdate,$inserttime,$Id,$Phase,$Size,$day,$order) += split(/ /,$line); open (INL, "file2.txt");-->contains the sorted order values of order v +alues($x) from file 1 while($linel=<INL>){ chomp($linel); ($x,$y,$z,$q,$t)= split(/ /,$linel); if($x == $order) { #print OUT "$a $x $b $z $q $t \n"; print OUT $Date," ",$M_Time," ",$inserdate," ",$y," ",$Id," ",$Pha +se," ",$Size," ",$day," ",$order,"\n"; } } } close(INL); close(IN); close(OUT); print "DONE"; [download] Hope it makes sense to you.. am not an expert in Perl.. just trial and error guys.. but now really need ur help...plzzz!	[reply] [d/l] [select]
Re^3: for each unique value in a column find the max value..need perl script by martell (Hermit) on May 31, 2011 at 21:14 UTC
A solution: mind you, I wouldn't use this on a very big file. Not exactly the method davido was describing, but it will do. my %id_hash; my @lines = (); open (DATA, "test.txt"); while ($line = <DATA>) { chomp $line; my @line = split /\t/ , $line; push @lines, \@line; # push the original line in an array as an an +onymous array if ($line[1] > $id_hash{$line[0]}) {$id_hash{$line[0]} = $line[1]; +} # calculate the biggest mtime for a given order id } open (OUT, ">output.txt"); foreach my $item (@lines) { my @line = @{$item}; # get the original line $line[4] = $id_hash{$line[0]}; # replace the fifth element with th +e calculated maximum print OUT join "\t" ,@line, "\n"; # print the adapted line } [download] Greetings Martell	[reply] [d/l]
Re^4: for each unique value in a column find the max value..need perl script by qmenon (Initiate) on Jun 01, 2011 at 08:08 UTC
Thanks Martell, You are right, it does look like it will take time to run it on big data files which is my case; I will give it a go anyways .	[reply]
Re^3: for each unique value in a column find the max value..need perl script by ww (Archbishop) on Jun 01, 2011 at 01:23 UTC
That's a bit hard to follow. I count seven captions (headers), `order mtime no size id day date`, but even hypothesizing that the dot in the second field, `14098703993 154538.354300 200 1 101510`, denotes a field break, I can only find six fields. 7==6 does not compute.	[reply] [d/l] [select]
Re^4: for each unique value in a column find the max value..need perl script by qmenon (Initiate) on Jun 01, 2011 at 08:24 UTC
Ah Now I get it, I just removed 1 field because I dint need it in the output file. Sorry for the confusion.	[reply]
Re^4: for each unique value in a column find the max value..need perl script by qmenon (Initiate) on Jun 01, 2011 at 08:14 UTC
Hi ww If I understand ur question correctly I guess u have missed the last field in my input file? 20100607 154538.354300 200 1 101510 14098703993 Thanks	[reply]