counting lines in perl

imhotep has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: counting lines in perl by Tanktalus (Canon) on Feb 26, 2005 at 19:19 UTC
What do you have so far? It's much easier to help you if we can see what you've done wrong. You are aware that uniq only removes consecutive repeated lines, right? So the trick is to only keep track of the last line, and the count of the last line. If the current line is identical, increment the count, otherwise print it out with the count and set a new last line. The second trick is that when you're done with the file, you'll have a last line that isn't printed out, so you'll have to handle that, too.	[reply]
Re^2: counting lines in perl by imhotep (Novice) on Feb 26, 2005 at 20:01 UTC
I have this, `#!/usr/bin/perl # uniq.pl: remove repeated lines. use English; use diagnostics; $oldline = ""; $n = 0; while ($line = <>) { unless ($line eq $oldline) { $n = $n + 1; print " $n $line"; } $oldline = $line; }` [download] I know that this is not right, it prints out just a straight increment of the output lines. I think that I need to combine the process so that the count stops at the end of each set of lines which I can do, but I can't work out how to print only the single line along with the number? Edit by BazB - add code tags.	[reply] [d/l]
Re^3: counting lines in perl by Tanktalus (Canon) on Feb 26, 2005 at 20:41 UTC
Try some <code> tags. `#!/usr/bin/perl # uniq.pl: remove repeated lines. use strict; use diagnostics; $oldline = ""; $n = 1; while ($line = <>) { if ($line eq $oldline) { #$n = $n + 1; $n++; } elsif ($oldline) { print " $n $oldline"; $n = 1; $oldline = $line; } } if ($oldline) { print " $n $line"; }` [download] That should help. I'm not sure why you're using English. You should use strict. You always have a count of at least one - not zero. What we're doing now is checking - if the lines match, increment the count. If they don't match, print out the last match, and then reset. Finally, when we're done, we'll print out the last line. Hope that helps. (Warning - untested.) Update: Of course, being untested, crashtest points out an obvious error... had $line when it should be $oldline.	[reply] [d/l]
Re^4: counting lines in perl by crashtest (Curate) on Feb 26, 2005 at 21:12 UTC
Re^5: counting lines in perl by imhotep (Novice) on Feb 26, 2005 at 22:24 UTC
Some notes below your chosen depth have not been shown here
Re: counting lines in perl by sh1tn (Priest) on Feb 26, 2005 at 19:29 UTC
Maybe this? `use Data::Dumper; my $lines; while( <DATA> ){ /^\s*$/ and next;s/\n//; $lines->{$_}{count}++; push @{$lines->{$_}{linenum}}, $. } print Dumper($lines); __DATA__ one one aaa bbb ccc aaa __END__ 'one' => { 'count' => 2, 'linenum' => [ '2', '3' ] }, 'bbb' => { 'count' => 1, 'linenum' => [ '5' ] } ...` [download]	[reply] [d/l]
Re: counting lines in perl by chas (Priest) on Feb 26, 2005 at 21:13 UTC
If I understood what uniq -c is supposed to do, how about: `while (<>){ $i++; chomp; $lines[$i]=$_; $times{$lines[$i]}++ if $lines[$i] ne $lines[$i-1]; }; @keys = keys %times; @values = values %times; while (@keys) { print pop(@values), ': ', pop(@keys), "\n"; }` [download] (One could likely make this more brief at the expense of readability.) chas	[reply] [d/l]
Re: counting lines in perl by davidj (Priest) on Feb 27, 2005 at 05:21 UTC
This is failry concise: `#!/usr/bin/perl use strict; my (%words, $key); open(FILE, "<test.txt"); while(<FILE>) { chomp($_); $words{$_}++; } close(FILE); foreach $key (keys %words) { print "$words{$key} $key\n"; } exit;` [download] davidj	[reply] [d/l]
Re^2: counting lines in perl by chas (Priest) on Feb 27, 2005 at 06:30 UTC
But that code doesn't seem to count groups of consecutive repetition just once, does it? - (which is what I thought the original poster wanted.) chas (Update: Actually, now that I've gone to a system where I could try out uniq -c, I see that I misunderstood what was desired so my code doesn't seem to do what the original poster wanted. Your code is closer, but the output isn't the same as that of uniq -c, at least the version I used. Sorry about the confusion...)	[reply]
Re^3: counting lines in perl by davidj (Priest) on Feb 27, 2005 at 13:42 UTC
You are correct. My code is flawed. For some reason I thought uniq -c sorted the file first, but it doesn't. My mistake. davidj	[reply]