g_string has asked for the wisdom of the Perl Monks concerning the following question:

Hi Folks

I am very much a newbie at perl but picking it up and I'm hoping you can help.

I have a file input that details all the /etc/group files in our enterprise in the following format: "<host>:<group>:<gid>:<users>"

I want to parse this data display it as the following: "<group>:<gid>:<users>"

So i can trace which users are members of each group (regardless of the host).

So far I have used a hash and am using <group>:<gid> as my key.

This is my code so far:

#!/usr/bin/perl use strict; use warnings; use Cwd; my $cwd = cwd; my $file = $cwd . '/usrgrps.txt'; my $gid; my $grp; my $host; my $group; my $userid; my %table = (); open(FILE, "<", $file) or die "Can't open $file:$!"; while(<FILE>) { chomp; ($host, $grp, $gid, $userid) = split(/ +:/, $_); $group = "$grp:$gid"; push @{$table{$group}}, $userid; } foreach $group (sort keys %table) { print "$group:"; my @users = @{$table{$group}}; print join ',', sort @users; print "\n"; } my $number = values %table; print $number . "\n";

This is roughly my input file:
host1:group1:9001:user1,user2,user3,user4,user5
host1:group2:9002:user1,user2,user3
host1:group3:9003:user1,user2,user4
host1:group4:9004:user1,user2,user5
host1:group5:9005:user1,user2
host1:group6:9006:
host2:group1:9001:user1,user2,user3,user4,user5
host2:group2:9002:user1,user2,user3
host2:group3:9003:user1,user2,user4
host2:group4:9004:user1,user2,user5
host2:group5:9005:user1,user2
host2:group1:9006:
host3:group1:9001:user1,user2,user3,user4,user5
host3:group2:9002:user1,user2,user3
host3:group3:9003:user1,user2,user4
host3:group4:9004:user1,user2,user5
host3:group5:9005:user1,user2
host3:group1:9006:

This is what I am getting back:

group1:9001:user1,user2,user3,user4,user5,user1,user2,user3,user4,user5,user1,user2,user3,user4,user5
group2:9002:user1,user2,user3,user1,user2,user3,user1,user2,user3
group3:9003:user1,user2,user4,user1,user2,user4,user1,user2,user4
group4:9004:user1,user2,user5,user1,user2,user5,user1,user2,user5
group5:9005:user1,user2,user1,user2,user1,user2
group6:9006:,,,


My question is, how do I get only unique elements assigned to each key? Any help or advise you can offer will be greatly appreciated. Thanks in advance

Replies are listed 'Best First'.
Re: Sorting a hash value that is a list
by jwkrahn (Abbot) on Feb 26, 2012 at 23:55 UTC

    use a Hash of Hashes instead of a Hash of Arrays:

    $ echo "host1:group1:9001:user1,user2,user3,user4,user5 host1:group2:9002:user1,user2,user3 host1:group3:9003:user1,user2,user4 host1:group4:9004:user1,user2,user5 host1:group5:9005:user1,user2 host1:group6:9006: host2:group1:9001:user1,user2,user3,user4,user5 host2:group2:9002:user1,user2,user3 host2:group3:9003:user1,user2,user4 host2:group4:9004:user1,user2,user5 host2:group5:9005:user1,user2 host2:group1:9006: host3:group1:9001:user1,user2,user3,user4,user5 host3:group2:9002:user1,user2,user3 host3:group3:9003:user1,user2,user4 host3:group4:9004:user1,user2,user5 host3:group5:9005:user1,user2 host3:group1:9006:" | perl -e' use strict; use warnings; my %table; while ( <> ) { chomp; my ( $host, $grp, $gid, $userid ) = split /:/; my $group = "$grp:$gid"; $table{ $group }{ $userid } = 1; } for my $group ( sort keys %table ) { print "$group:"; my @users = sort keys %{ $table{ $group } }; print join( ",", @users ), "\n"; } ' group1:9001:user1,user2,user3,user4,user5 group1:9006: group2:9002:user1,user2,user3 group3:9003:user1,user2,user4 group4:9004:user1,user2,user5 group5:9005:user1,user2 group6:9006:
Re: Sorting a hash value that is a list
by tangent (Parson) on Feb 26, 2012 at 23:45 UTC
    [interesting username]
    You could create a hash to keep track of your users like so:
    my %seen = (); while(<FILE>) { chomp; ($host, $grp, $gid, $userid) = split(/:/, $_); $group = "$grp:$gid"; # check to see if we've seen this group/user already next if $seen{$group}{$userid}++; push @{$table{$group}}, $userid; } Prints: group1:9001:user1,user2,user3,user4,user5 group1:9006: group2:9002:user1,user2,user3 group3:9003:user1,user2,user4 group4:9004:user1,user2,user5 group5:9005:user1,user2 group6:9006: 7
    Note that $seen{$group}{$userid}++ only increments after the check
Re: Sorting a hash value that is a list
by AnomalousMonk (Archbishop) on Feb 27, 2012 at 06:10 UTC

    jwkrahn's suggestion of a hash-of-hashes is probably best, but if you're married to an anonymous array for holding your users, you can uniq-ify (and maybe sort at the same time) each user array after all data is collected.

    >perl -wMstrict -le "use List::MoreUtils qw(uniq); use Data::Dumper; ;; my %table = ( foo => [ qw( c b a b a c ) ], bar => [ qw( g f e d f g d e ) ], baz => [ qw( i h h i) ], ); ;; $_ = [ sort { $a cmp $b } uniq @$_ ] for values %table; ;; print Dumper \%table; " $VAR1 = { 'bar' => [ 'd', 'e', 'f', 'g' ], 'baz' => [ 'h', 'i' ], 'foo' => [ 'a', 'b', 'c' ] };
Re: Sorting a hash value that is a list
by locked_user sundialsvc4 (Abbot) on Feb 27, 2012 at 15:16 UTC

    When you need to find a unique-anything, I almost always use a hash.   In this case, the “auto-vivification” goodness of Perl is simply natural ... $myhash->{$group}{$gid}{$user}=1;.   (You don’t care what the value is.)   Any of the buckets in the entire structure that do not exist yet will, simply, appear when they are required.   Then just loop through sorted keys as you do.   To print out the final comma-separated list of users, join(',', sort keys hashref) is made for the part.

      Hi again,


      Thanks for all the replys, it much appreciated; I've tried all solutions (except for 'AnomalousMonk' as I work for large corporation, getting extra modules to use requires going to the DMZ which is not feasible - so I'm stuck with the bundle Solaris 10 comes with). I'm still not getting a unique list of users and this what I think is happening: When the hash pulls all the unique values ($userid) to each unique key ($group), the value is taken as a block. So for example:


      This:
      group1:9001:user1,user2,user3,user4,user5,user1,user2,user3,user4,user5,user1,user2,user3,user4,user5
      group2:9002:user1,user2,user3,user1,user2,user3,user1,user2,user3
      group3:9003:user1,user2,user4,user1,user2,user4,user1,user2,user4
      group4:9004:user1,user2,user5,user1,user2,user5,user1,user2,user5
      group5:9005:user1,user2,user1,user2,user1,user2
      group6:9006:,,,


      Is actually this in the hash:
      group1:9001:"user1,user2,user3,user4,user5," "user1,user2,user3,user4,user5," "user1,user2,user3,user4,user5"
      group2:9002:"user1,user2,user3," "user1,user2,user3," "user1,user2,user3"
      group3:9003:"user1,user2," "user4," "user1,user2,user4," "user1,user2," "user4"
      group4:9004:"user1,user2," "user5," "user1,user2," "user5," "user1,user2," "user5"
      group5:9005:"user1,user2," "user1,user2," "user1,user2"
      group6:9006:"," "," ","


      So each block of userid's is a unique value.
      I'm lost on how to do this (if there is a way).
      Hope that makes sense?
      Any feedback will be greatly appreciated.
      Thanks again.


        As another monk mentioned, most of the reasons people think they can't install from CPAN are in fact non-reasons.

        But either way, the uniq function from List::MoreUtils is easy for you to implement yourself:

        sub uniq (@) { my %seen = (); grep { not $seen{$_}++ } @_; }

        As it happens, I just copied and pasted that code from List/MoreUtils.pm. However, one reason to prefer List::MoreUtils over a home-made version, is that List::MoreUtils provides an XS version which (over large lists) may run a little faster.