walkingthecow has asked for the wisdom of the Perl Monks concerning the following question:

Hey all, I have a large file that contains UNIX groups, and under each group is the users within the group. Now, each user has a number associated with them; so like user1 would have a line like this: "user1:user1,CO#". Lets say every group has roughly 300 users. I have written a script below that will count all the CO#s and tell me the one that occurs most, or if there is a tie it will tell me the ones that occur most. Anyway, I want to modify the script below to tell me the one that occured the most occurs x out of y times. So, in other words, X occurs the most times, and X occurs 90/100 times, if that makes sense, and maybe the other 10 times would be Y, but I don't really care about that. Anyway, code below, need any clarification please do let me know.

Here's a better example:
user1:user1,CO12345
user2:user2,CO12345
user3:user3,CO12345
user4:user4,CO54321
user5:user5,CO54321
user6:user6,CO12345


So, CO12345 is the most popular number, and CO12345 occurs 4 out of 6 times.
#!/usr/bin/perl -w unshift(@INC,"$ENV{'PWD'}"); use Util qw(max) ; my $file_name; my $ans; if (@ARGV == 1) { chomp ($file_name=$ARGV[0]); } else { print "\n\nPlease enter the file name to certify: "; chomp ($file_name=<STDIN>); while(1) { print "\n\nYou entered $file_name - is this correct? <y or n>:"; chomp ($ans=<STDIN>); if ($ans =~ /[Nn]/) { print "\n\nPlease enter the server name: "; chomp ($file_name=<STDIN>); next; } elsif ($ans =~ /[Yy]/) { last; } else { next; } } } open(STUFF,"<$file_name") or die "$!"; my %number_ids = () ; while (my $line = <STUFF>) { $line =~ s/\s+\z// ; $line =~ s/\A\s+// ; my @csv = split(/:/, $line) ; if (defined($csv[1]) && ($csv[1] =~ m{((CO)\d{5})})) { $num = $1; $number_ids{$num}++; } elsif (defined($csv[0]) && ($csv[0] =~ m/^Group.*/i)) { show_most_popular(\%number_ids); print "$csv[1],"; %number_ids = (); } else { # what to do we do with peculiar lines? }; }; show_most_popular(\%number_ids); sub show_most_popular { my ($r_ids) = @_ ; return if !%$r_ids ; my $max = max(values %$r_ids) ; my @popular = () ; while (my ($id, $count) = each %$r_ids) { if ($count == $max) { push @popular, $id ; } ; } ; print join(',', sort @popular), "\n"; } ;

Replies are listed 'Best First'.
Re: Count occurences of numbers
by svenXY (Deacon) on Jan 20, 2009 at 08:48 UTC
    Hi,
    I'd do it with a simple hash and no additional max-function, but here's a solution with your code. Basically, you need to count the number of groups and compare it with $max. Some other adjustments have been necessary, please see comments in the code:

    Regards,
    svenXY
Re: Count occurences of numbers
by JavaFan (Canon) on Jan 20, 2009 at 10:24 UTC
    Since your question already has been answered, a different remark.

    For Unix tools that operate on file content, the typical behaviour of a program that is called without arguments is to read the content from STDIN, and to not prompt the user for a file name. Reading from STDIN allows people to easily chain tools with pipes.

      Hi,
      ++JavaFan - completely agreed. I should have pointed that out as well ;-), but this one time chose to not alter the OP's code too much...
      Regards,
      svenXY
        So, do you mean something like this:

        while (my $line = <>) {

        } ??
Re: Count occurences of numbers
by matze77 (Friar) on Jan 20, 2009 at 08:40 UTC

    Hmm. You want to see which users are in a lot of groups, maybe got too much rights, or what is the "practical" reason behind this (I wonder if i need this too, maybe ;-)). Or is it just for fun?

    Thanks MH