imhotep has asked for the wisdom of the Perl Monks concerning the following question:

Hail, perl monks! I am once again at the temple seeking wisdom I beg your endulgence! Running active perl on windows, I need to duplicate the "uniq -c" function of the gnu utilities! This function prints all lines of input bar repeated lines, the "-c" part counts the number of times each line was found and prints that number out along with the line e.g. 4 "hello". I need to do this within a while loop! I can do the first part by using a $oldline variable thereby only printing lines that are not the same as the previous one but I am struggling counting the number of hits? Your help would be greatly appreciated!

Replies are listed 'Best First'.
Re: counting lines in perl
by Tanktalus (Canon) on Feb 26, 2005 at 19:19 UTC

    What do you have so far? It's much easier to help you if we can see what you've done wrong.

    You are aware that uniq only removes consecutive repeated lines, right? So the trick is to only keep track of the last line, and the count of the last line. If the current line is identical, increment the count, otherwise print it out with the count and set a new last line. The second trick is that when you're done with the file, you'll have a last line that isn't printed out, so you'll have to handle that, too.

      I have this,
      #!/usr/bin/perl # uniq.pl: remove repeated lines. use English; use diagnostics; $oldline = ""; $n = 0; while ($line = <>) { unless ($line eq $oldline) { $n = $n + 1; print " $n $line"; } $oldline = $line; }

      I know that this is not right, it prints out just a straight increment of the output lines. I think that I need to combine the process so that the count stops at the end of each set of lines which I can do, but I can't work out how to print only the single line along with the number?

      Edit by BazB - add code tags.

        Try some <code> tags.

        #!/usr/bin/perl # uniq.pl: remove repeated lines. use strict; use diagnostics; $oldline = ""; $n = 1; while ($line = <>) { if ($line eq $oldline) { #$n = $n + 1; $n++; } elsif ($oldline) { print " $n $oldline"; $n = 1; $oldline = $line; } } if ($oldline) { print " $n $line"; }
        That should help. I'm not sure why you're using English. You should use strict. You always have a count of at least one - not zero. What we're doing now is checking - if the lines match, increment the count. If they don't match, print out the last match, and then reset. Finally, when we're done, we'll print out the last line.

        Hope that helps.

        (Warning - untested.)

        Update: Of course, being untested, crashtest points out an obvious error... had $line when it should be $oldline.

Re: counting lines in perl
by sh1tn (Priest) on Feb 26, 2005 at 19:29 UTC
    Maybe this?
    use Data::Dumper; my $lines; while( <DATA> ){ /^\s*$/ and next;s/\n//; $lines->{$_}{count}++; push @{$lines->{$_}{linenum}}, $. } print Dumper($lines); __DATA__ one one aaa bbb ccc aaa __END__ 'one' => { 'count' => 2, 'linenum' => [ '2', '3' ] }, 'bbb' => { 'count' => 1, 'linenum' => [ '5' ] } ...


Re: counting lines in perl
by chas (Priest) on Feb 26, 2005 at 21:13 UTC
    If I understood what uniq -c is supposed to do, how about:
    while (<>){ $i++; chomp; $lines[$i]=$_; $times{$lines[$i]}++ if $lines[$i] ne $lines[$i-1]; }; @keys = keys %times; @values = values %times; while (@keys) { print pop(@values), ': ', pop(@keys), "\n"; }
    (One could likely make this more brief at the expense of readability.)
    chas
Re: counting lines in perl
by davidj (Priest) on Feb 27, 2005 at 05:21 UTC
    This is failry concise:

    #!/usr/bin/perl use strict; my (%words, $key); open(FILE, "<test.txt"); while(<FILE>) { chomp($_); $words{$_}++; } close(FILE); foreach $key (keys %words) { print "$words{$key} $key\n"; } exit;


    davidj
      But that code doesn't seem to count groups of consecutive repetition just once, does it? - (which is what I thought the original poster wanted.)
      chas
      (Update: Actually, now that I've gone to a system where I could try out uniq -c, I see that I misunderstood what was desired so my code doesn't seem to do what the original poster wanted. Your code is closer, but the output isn't the same as that of uniq -c, at least the version I used. Sorry about the confusion...)
        You are correct. My code is flawed. For some reason I thought uniq -c sorted the file first, but it doesn't. My mistake.

        davidj