Count occurences of numbers

walkingthecow has asked for the wisdom of the Perl Monks concerning the following question:

Hey all, I have a large file that contains UNIX groups, and under each group is the users within the group. Now, each user has a number associated with them; so like user1 would have a line like this: "user1:user1,CO#". Lets say every group has roughly 300 users. I have written a script below that will count all the CO#s and tell me the one that occurs most, or if there is a tie it will tell me the ones that occur most. Anyway, I want to modify the script below to tell me the one that occured the most occurs x out of y times. So, in other words, X occurs the most times, and X occurs 90/100 times, if that makes sense, and maybe the other 10 times would be Y, but I don't really care about that. Anyway, code below, need any clarification please do let me know.

Here's a better example:
user1:user1,CO12345
user2:user2,CO12345
user3:user3,CO12345
user4:user4,CO54321
user5:user5,CO54321
user6:user6,CO12345

So, CO12345 is the most popular number, and CO12345 occurs 4 out of 6 times.

#!/usr/bin/perl -w

  unshift(@INC,"$ENV{'PWD'}");
  use Util qw(max) ;

  my $file_name;
  my $ans;

  if (@ARGV == 1) {
     chomp ($file_name=$ARGV[0]);
  }
  else {
   print "\n\nPlease enter the file name to certify: ";
   chomp ($file_name=<STDIN>);
   while(1) {
      print "\n\nYou entered $file_name - is this correct? <y or n>:";
      chomp ($ans=<STDIN>);
      if ($ans =~ /[Nn]/) {
         print "\n\nPlease enter the server name: ";
         chomp ($file_name=<STDIN>);
         next;
      }
      elsif ($ans =~ /[Yy]/) {
         last;
      }
      else {
         next;
      }
    }
  }
  open(STUFF,"<$file_name") or die "$!";

  my %number_ids = () ;

  while (my $line = <STUFF>) {
    $line =~ s/\s+\z// ;
    $line =~ s/\A\s+// ;
    my @csv = split(/:/, $line) ;

    if (defined($csv[1]) && ($csv[1] =~ m{((CO)\d{5})})) {
      $num = $1;
      $number_ids{$num}++;
    }
    elsif (defined($csv[0]) && ($csv[0] =~ m/^Group.*/i)) {
      show_most_popular(\%number_ids);
      print "$csv[1],";
      %number_ids = ();
    }
    else {
      # what to do we do with peculiar lines?
    };
  };

  show_most_popular(\%number_ids);

  sub show_most_popular {
    my ($r_ids) = @_ ;

    return if !%$r_ids ;

    my $max = max(values %$r_ids) ;
    my @popular = () ;
    while (my ($id, $count) = each %$r_ids) {
      if ($count == $max) { push @popular, $id ; } ;
    } ;

    print join(',', sort @popular), "\n";
  } ;
[download]

Comment on Count occurences of numbers Download Code

Replies are listed 'Best First'.
Re: Count occurences of numbers by svenXY (Deacon) on Jan 20, 2009 at 08:48 UTC
Hi, I'd do it with a simple hash and no additional max-function, but here's a solution with your code. Basically, you need to count the number of groups and compare it with $max. Some other adjustments have been necessary, please see comments in the code: Read more... (3 kB) Regards, svenXY	[reply] [d/l]
Re: Count occurences of numbers by JavaFan (Canon) on Jan 20, 2009 at 10:24 UTC
Since your question already has been answered, a different remark. For Unix tools that operate on file content, the typical behaviour of a program that is called without arguments is to read the content from STDIN, and to not prompt the user for a file name. Reading from STDIN allows people to easily chain tools with pipes.	[reply]
Re^2: Count occurences of numbers by svenXY (Deacon) on Jan 20, 2009 at 10:46 UTC
Hi, ++JavaFan - completely agreed. I should have pointed that out as well ;-), but this one time chose to not alter the OP's code too much... Regards, svenXY	[reply]
Re^3: Count occurences of numbers by walkingthecow (Friar) on Jan 20, 2009 at 15:15 UTC
So, do you mean something like this: while (my $line = <>) { } ??	[reply]
Re^4: Count occurences of numbers by graff (Chancellor) on Jan 21, 2009 at 03:31 UTC
Re: Count occurences of numbers by matze77 (Friar) on Jan 20, 2009 at 08:40 UTC
Hmm. You want to see which users are in a lot of groups, maybe got too much rights, or what is the "practical" reason behind this (I wonder if i need this too, maybe ;-)). Or is it just for fun? Thanks MH	[reply]