Hello larsb, and welcome to the Monastery!
Another approach is to modify the text of the file by using s///g to replace each repeated word with its marked version. The following script shows one way to do this (but it doesn’t take into account the maximum number of words allowed between repeats):
#! perl use strict; use warnings; my $file = do { local $/; <DATA>; }; # Slurp the whole file int +o a string # Make a hash that maps each word to its word count in the file my %words; ++$words{lc $_} for split /\W+/, $file; # Construct a regular expression to match each word which appears at l +east twice my $str = join '|', grep { $words{$_} > 1 } keys %words; my $re = qr{($str)}i; $words{$_} = 0 for keys %words; # Re-set the word counts t +o zero # Mark the second and subsequent occurrences of each word $file =~ s{$re}{ $words{lc $1}++ ? "*$1*" : $1 }eg; print $file; __DATA__ To be or not to be; that is to be the question. Is that the question? Yes!
Output:
0:13 >perl 1369_SoPW.pl To be or not *to* *be*; that is *to* *be* the question. *Is* *that* *the* *question*? Yes! 0:13 >
Hope that helps,
| Athanasius <°(((>< contra mundum | Iustus alius egestas vitae, eros Piratica, |
In reply to Re: Preserve original text formatting.
by Athanasius
in thread Preserve original text formatting.
by larsb
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |