in reply to Re^2: counting lines in perl
in thread counting lines in perl

Try some <code> tags.

#!/usr/bin/perl # uniq.pl: remove repeated lines. use strict; use diagnostics; $oldline = ""; $n = 1; while ($line = <>) { if ($line eq $oldline) { #$n = $n + 1; $n++; } elsif ($oldline) { print " $n $oldline"; $n = 1; $oldline = $line; } } if ($oldline) { print " $n $line"; }
That should help. I'm not sure why you're using English. You should use strict. You always have a count of at least one - not zero. What we're doing now is checking - if the lines match, increment the count. If they don't match, print out the last match, and then reset. Finally, when we're done, we'll print out the last line.

Hope that helps.

(Warning - untested.)

Update: Of course, being untested, crashtest points out an obvious error... had $line when it should be $oldline.

Replies are listed 'Best First'.
Re^4: counting lines in perl
by crashtest (Curate) on Feb 26, 2005 at 21:12 UTC
    Actually Tanktalus, that won't work. You said, If they don't match, print out the last match, and then reset. But when you print $line instead of $oldline, you're printing the next unique line you've found, not the line you've been counting for. Here's the altered code, also fixed so that it now runs under use strict:
    #!/usr/bin/perl # uniq.pl: remove repeated lines. use strict; use diagnostics; my $oldline = <>; # Priming read my $n = 1; while (my $line = <>) { if ($line eq $oldline) { $n++; #$n = $n + 1; } else { print " $n $oldline"; $n = 1; $oldline = $line; } } if ($oldline) { print " $n $oldline"; }
    Note: The program will hang if there isn't at least one line of data to read.
      Thanks guys! Crashtest, could you explain how you have stopped the count after each set of repeated lines? Does the "else" statement stop the loop? Why do you the last "if" statement?
        Hmmm... let's see if I can explain this clearly. Basically, the code you originally posted spit out data as soon as it encountered a unique line. That is, it kept track of the previous line and as soon as the current line was different from the previous one (unless($line eq $oldline)), it would print it. The problem with printing the new unique line immediately is that you cannot count how often it appears in sequence, because you don't know yet. You've only found the first occurance of it at that point.

        The solution then is to print each unique line after its last appearance in a sequence, not the first. To answer your question, that's what the "else" part within the loop does. It does not stop the loop (which just iterates over all of your input lines, after all).

        The else executes if the current line is different from the one that's being kept track of (the $oldline). In other words, it executes as soon as you encounter a new unique line. The code prints the line along with the count, and then resets the counter (the $n = 1; line).

        That last if statement will actually always evaluate to true. It says: "If there is any text in $oldline, then print it along with its count. If we didn't do this, the very last unique line would never get printed.

        To pseudo-code it:
        Read in my first input line and store it in $oldline Now loop over the rest of the input lines if the current line is the same as $oldline, then increment our count otherwise (that is, the current line is different) print $oldline and the counter reset the counter set the current line as the next line to keep track of by storin +g it in $oldline Loop if there's something in $oldline (which will always be the case), prin +t it

        It just struck me... I hope I haven't just done your homework assignment or something.

        Update: I see it was homework, but in light of imhotep's explanation and the disclaimers on previous nodes, I don't mind having done it. I miss college anyway.