in reply to removing repeats

You're printing the word as you discover it's a repeat. You actually want to print the word once you find the next word that doesn't match the repeated word. So, you want to do something like:
  1. Look at each word.
  2. if( $word eq $prev_word) then $times_seen++
  3. else
    1. If $times_seen > 1, print the $line_num and $word.
    2. $prev_word = $word.
    3. $times_seen = 1.

Being right, does not endow the right to be rude; politeness costs nothing.
Being unknowing, is not the same as being stupid.
Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

Replies are listed 'Best First'.
Re^2: removing repeats
by Roy Johnson (Monsignor) on Mar 01, 2005 at 15:16 UTC
    Actually, with a flip-flop, you can print the 2nd occurrence.
    foreach $word (@words) { if ($word ne '') { my $flip = ($word eq $prevword)..($word ne $prevword); if ($flip eq '1') { # Use eq to avoid warning when flip is + '' print " $n $word\n"; } $prevword = $word; } }

    Caution: Contents may have been coded under pressure.
      Hey Roy Johnson, Thank you for your reply. I tried it and it works.
      Hey Roy Johnson, Thank you for your reply. I tried it and it works. Could you explain the logic to me though, please? My understanding of it is you're creating a new variable which you call $flip and saying what this flip is equal to, i.e it's equal to both word equal to previous word and word that isn't equal to previous word, and then if this flip is equal to 1, i.e the word that is the immediate next one along in an input of text. Is my understanding correct? I would really appreciate your help. Thank you,
        Sure. The scalar range operator (or flip-flop) is one of the lesser-used features of Perl. perldoc perlop describes its behavior:
        The operator is bistable, like a flip-flop, and emulates the line-range (comma) operator of sed, awk, and various editors. Each ".." operator maintains its own boolean state. It is false as long as its left operand is false. Once the left operand is true, the range operator stays true until the right operand is true, AFTER which the range operator becomes false again.
        ...
        The value returned is either the empty string for false, or a sequence number (beginning with 1) for true. The sequence number is reset for each range encountered.
        For your case, $flip effectively counts the repeated words, starting at 1 when a word matches prevword, and ending when a word does not match prevword. So $flip will be 1 for the first repeated word, 2 for the next repetition, etc. When it encounters a new word, it will be return whatever the count is, appended with E0 to indicate that it's finished the run. Like 5E0.

        You can see its behavior in your own code by simply printing $flip after it is set.


        Caution: Contents may have been coded under pressure.
Re^2: removing repeats
by dummy2 (Initiate) on Mar 01, 2005 at 16:49 UTC

    Hi again,
    Thanks for your very prompt reply but it is the words that are adjacently repeated that I want to print. I can do this part, but what my script does is print out all these repeats instead of just showing the repeat once, so for example if a line contains the word hello eight times in a row, i only want this word to show once. here's my script.
    I will be very grateful for your help. Thank you!!

    #!/usr/bin/perl # rcwords.pl: print immediately adjacent repeated words once from inpu +t, # even if they are repeated more than once, # and print out these words along with the line number(s) that they ap +peared. use English; use diagnostics; $prevword = ''; $n = 0; while ($line = <>) { $n = $n + 1; $line =~ s/[[:punct:]]/ /g; $line = lc $line; @words = split /[[:space:]]+/, $line; foreach $word (@words) { if ($word ne '') { if ($word eq $prevword) { unless ($prevword eq $prevword) { print " $n $word\n"; } } $prevword = $word; } } }

    Edit by BazB. Add code tags.