http://qs1969.pair.com?node_id=203883

willa has asked for the wisdom of the Perl Monks concerning the following question:

Given a colon-separated file, how do you count the number of lines with an identical second field that have different third fields? For example, if you gave it:
fruit:apple:cox fruit:apple:pippin fruit:apple:granny fruit:banana:yellow fruit:banana:green
It would come back with:
apple 3 banana 2

Replies are listed 'Best First'.
Re: Counting lines by content
by broquaint (Abbot) on Oct 09, 2002 at 11:19 UTC
    You could handle this with a simple two-liner
    perl -F: -e 'END { print "$_ $c{$_}\n" for keys %c }' \ -ane '$c{$F[1]}++'
    See perlrun for more info on perl's command-line options.
    HTH

    _________
    broquaint

      Thanks - I also need to make sure there are no duplicates in the third field. I assume this is easy too...
        This will count the second field ignoring duplicates
        perl -F: -e 'END { print "$_ $c{$_}\n" for keys %c }' \ -ane '$c{$F[1]}++ unless $d{"@F[1,2]"}++'
        I assume this is what you meant no duplicates in the third field.
        HTH

        _________
        broquaint

Re: Counting lines by content
by sch (Pilgrim) on Oct 09, 2002 at 11:25 UTC

    I think the easiest way would be to use a hash - something like this?

    Oops - just realised this doesn't handle the case where the 3rd field is a duplicate - in fact it doesn't worry about the 3rd field at all. guha has supplied some modifed code, which I've used to replace my slightly dodgy stuff!

    #!perl use strict; use warnings; use diagnostics; my ($type, $desc, %fruit); open (FH, "y") || die "Cannot find file"; while (<FH>) { (undef, $type, $desc) = split /:/; $fruit{$type}{$desc}++; } close FH; foreach my $type (keys(%fruit)) { print "$type : ",scalar keys %{ $fruit{$type} }, "\n"; }
      A hash is the right answer, but I believe that he is looking for a hash of hashes....
      #!/usr/bin/perl open(IN,"/some/file") || die "Cant open file\nReason: $!\n"; while (<IN>) { chomp($line = $_); ($first,$second) = (split(/:/, $line))[1,2]; $fruit{$first}{count}++; $fruit{$first}{$second}++; } close(IN); foreach $k (sort(keys(%fruit))) { print "$k $fruit{$k}{count}\n"; }
      The reason I used count as well, is so A) I dont have to loop to figure out what my total count for $first is, and B) I also can test for $fruit{$k}{blah} and determine if there were duplicates. You could add the test within the while loop.. I.e test for $fruit{$first}{$second} and if it exists warn or something, else increment it :)... Have fun /* * And the Creator, against his better judgement, wrote man.c */
Re: Counting lines by content
by Anonymous Monk on Oct 09, 2002 at 15:57 UTC
    sniff sniff... reminds me of... homework...
Re: Counting lines by content
by hackmare (Pilgrim) on Oct 10, 2002 at 11:05 UTC

    I would use hashes of hashes and then count the number of entries. This is just a simple flat-file to tree generation question, akin to flatfile-to-xml constructor.

    #!/usr/bin/perl print "Hello, World...\n"; use strict; use Data::Dumper; my @in = qw/ fruit:apple:cox fruit:apple:pippin fruit:apple:granny fruit:banana:yellow fruit:banana:green /; #make an anonymous hash my $h = {}; foreach (@in) { my @a = split ':',$_; my $branch = shift @a; my $species = shift @a; my $breed = shift @a; $h->{$branch}->{$species}->{$breed} = $h->{$branch}->{$species}->{ +$breed} + 1 || 1; } print Dumper($h); print "there are ".scalar (keys %{$h->{fruit}})." fruit species\n"; print "there are ".scalar (keys %{$h->{fruit}->{apple}})." apple breed +s\n"; print "there are ".scalar (keys %{$h->{fruit}->{banana}})." banana bre +eds\n"; print "Good luck with your homework\n";
    Returns:
    C:\>perl test.pl Hello, World... $VAR1 = { 'fruit' => { 'apple' => { 'pippin' => 1, 'cox' => 1, 'granny' => 1 }, 'banana' => { 'green' => 1, 'yellow' => 1 } } }; there are 2 fruit species there are 3 apple breeds there are 2 banana breeds Good luck with your homework

    hackmare.