natol44 has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

Well after 2 days search, I give up and call for help :(

I have a list of files always on the same syntax:

string_amount_end

string is a 32 letters/digit chain (a md5 hash)
amount is always formatted with a digit and 2 decimals (1.00, 45.21, etc)
_end can be nul, or _min or _max
So a file name looks like 111aaa222ccc324567fed54333221235_1.02_max
or 111aaa222ccc324567fed54333221235_1.02
etc

I need to find all the files with the same "string" AND that have "_max" at the end, then inside those files I need to find the one where "amount" is the lowest.

So for example in the following list,:

111aaa222ccc324567fed54333221235_1.04
111aaa222ccc324567fed54333221235_1.05_max
111aaa222ccc324567fed54333221235_0.98_min
111aaa222ccc324567fed54333221235_1.02_max
111aaa222ccc324567fed54333221235_0.21
777aaa222ccc324567fed54333221235_1.04
777aaa222ccc324567fed54333221235_1.07_min
777aaa222ccc324567fed54333221235_1.04_max

if I am interested by the string 111aaa222ccc324567fed54333221235, the result should be 1.02 (from 111aaa222ccc324567fed54333221235_1.02_max)

Additionally I also need to know if there is no _max file at all in the selected files (that have the same "string").

Well I am a newbie, even a code of 20 lines didn't get the good result :(

Thank you for your help! (I add that it is not for business or exam etc, I am simply writing a game for my website :)

Replies are listed 'Best First'.
Re: File sort and values in file name
by ikegami (Patriarch) on Sep 18, 2009 at 19:39 UTC

    I need to find all the files with the same "string"

    Group by "string" ⇒ use a hash.

    use List::Util qw( min ); my %grouped_maxes; while (<DATA>) { chomp; my ($hash, $amount, $end) = split /_/; push @{ $grouped_maxes{$hash} }, $amount if defined($end) && $end eq 'max'; } for my $hash (keys %grouped_maxes) { print("$hash: ", min( @{ $grouped_maxes{$hash} } ), "\n"); } __DATA__ 111aaa222ccc324567fed54333221235_1.04 111aaa222ccc324567fed54333221235_1.05_max 111aaa222ccc324567fed54333221235_0.98_min 111aaa222ccc324567fed54333221235_1.02_max 111aaa222ccc324567fed54333221235_0.21 777aaa222ccc324567fed54333221235_1.04 777aaa222ccc324567fed54333221235_1.07_min 777aaa222ccc324567fed54333221235_1.04_max 888aaa222ccc324567fed54333221235_0.21

    Additionally I also need to know if there is no _max file at all in the selected files

    Keep track of all the unique hash strings you see, keep track of the unique hash strings you see with a max, then remove those with a max from the whole set.

    Remove duplicates ⇒ use a hash.

    use List::Util qw( min ); my %hashes; my %grouped_maxes; while (<DATA>) { chomp; my ($hash, $amount, $end) = split /_/; ++$hashes{$hash}; push @{ $grouped_maxes{$hash} }, $amount if defined($end) && $end eq 'max'; } for my $hash (keys %hashes) { if (exists($grouped_maxes{$hash})) { my $min_max = min( @{ $grouped_maxes{$hash} } ); print("$hash: $min_max\n"); } else { print("$hash: No max\n"); } } __DATA__ 111aaa222ccc324567fed54333221235_1.04 111aaa222ccc324567fed54333221235_1.05_max 111aaa222ccc324567fed54333221235_0.98_min 111aaa222ccc324567fed54333221235_1.02_max 111aaa222ccc324567fed54333221235_0.21 777aaa222ccc324567fed54333221235_1.04 777aaa222ccc324567fed54333221235_1.07_min 777aaa222ccc324567fed54333221235_1.04_max 888aaa222ccc324567fed54333221235_0.21
    777aaa222ccc324567fed54333221235: 1.04 888aaa222ccc324567fed54333221235: No max 111aaa222ccc324567fed54333221235: 1.02
Re: File sort and values in file name
by Bloodnok (Vicar) on Sep 18, 2009 at 20:08 UTC
    Please be advised that the norm is to include, if not the full, then a compilable/working sub-set of the, code that, when run, manifests the problem(s) you encountered on the way. Having said that...
    use warnings; use strict; use Data::Dumper; my @files = <DATA>; chomp @files; warn Dumper \@files; sub get_least($) { my $md5 = shift; my $num = ( sort map { s/${md5}_//; s/_max//; ($_) } grep /^$md5.*_max/, @ +files )[0]; warn("No max file"), return undef unless $num; return $num; } warn get_least(q/111aaa222ccc324567fed54333221235/); warn get_least(q/777aaa222ccc324567fed54333221235/); warn get_least(q/99999999999999999999999999999999/); __DATA__ 111aaa222ccc324567fed54333221235_1.04 111aaa222ccc324567fed54333221235_1.05_max 111aaa222ccc324567fed54333221235_0.98_min 111aaa222ccc324567fed54333221235_1.02_max 111aaa222ccc324567fed54333221235_0.21 777aaa222ccc324567fed54333221235_1.04 777aaa222ccc324567fed54333221235_1.07_min 777aaa222ccc324567fed54333221235_1.04_max 99999999999999999999999999999999_1.04 99999999999999999999999999999999_1.07_min 99999999999999999999999999999999_1.04_min
    when run, gives
    $ perl tst.pl $VAR1 = [ '111aaa222ccc324567fed54333221235_1.04', '111aaa222ccc324567fed54333221235_1.05_max', '111aaa222ccc324567fed54333221235_0.98_min', '111aaa222ccc324567fed54333221235_1.02_max', '111aaa222ccc324567fed54333221235_0.21', '777aaa222ccc324567fed54333221235_1.04', '777aaa222ccc324567fed54333221235_1.07_min', '777aaa222ccc324567fed54333221235_1.04_max', '99999999999999999999999999999999_1.04', '99999999999999999999999999999999_1.07_min', '99999999999999999999999999999999_1.04_min' ]; 1.02 at tst.pl line 22, <DATA> line 11. 1.04 at tst.pl line 23, <DATA> line 11. No max file at tst.pl line 18, <DATA> line 11. Use of uninitialized value in warn at tst.pl line 24, <DATA> line 11. Warning: something's wrong at tst.pl line 24, <DATA> line 11.
    which, I believe, is as required.

    A user level that continues to overstate my experience :-))
Re: File sort and values in file name
by bichonfrise74 (Vicar) on Sep 18, 2009 at 23:01 UTC
    Try this...
    #!/usr/bin/perl use strict; my %records; while (<DATA>) { chomp; my @values = split( "_" ); next unless $values[2] eq 'max'; $records{ $values[0] } = $values[1] if ( $records{ $values[0] } > $values[1] || ! $records{ $values[0] } ); } print map { "$_ => $records{$_} \n" } keys %records; __DATA__ 111aaa222ccc324567fed54333221235_1.04 111aaa222ccc324567fed54333221235_1.05_max 111aaa222ccc324567fed54333221235_0.98_min 111aaa222ccc324567fed54333221235_1.02_max 111aaa222ccc324567fed54333221235_0.21 777aaa222ccc324567fed54333221235_1.04 777aaa222ccc324567fed54333221235_1.07_min 777aaa222ccc324567fed54333221235_1.04_max
      I liked your code, but I am averse to using [$index] in Perl code unless it is needed. I would give names to the list on the left as result of a split. A chomp() would make the split regex a bit easier, but no biggie. I am having a "brain cramp" right now on how to code the hash assignment statement better. My main point is to avoid this numeric $index stuff.
      #!/usr/bin/perl -w use strict; my %records; while (<DATA>) { my ($md5,$version,$end) = split(/[_\n]+/,$_); next unless ($end eq 'max'); $records{$md5} = $version unless ( exists $records{$md5} && $records{$md5}< $version) ; } foreach my $md5 (sort keys %records) { print $md5,"_",$records{$md5},"_max\n"; } #Prints: #111aaa222ccc324567fed54333221235_1.02_max #777aaa222ccc324567fed54333221235_1.04_max __DATA__ 111aaa222ccc324567fed54333221235_1.04 111aaa222ccc324567fed54333221235_1.05_max 111aaa222ccc324567fed54333221235_0.98_min 111aaa222ccc324567fed54333221235_1.02_max 111aaa222ccc324567fed54333221235_0.21 777aaa222ccc324567fed54333221235_1.04 777aaa222ccc324567fed54333221235_1.07_min 777aaa222ccc324567fed54333221235_1.04_max