in reply to Preserve original text formatting.
Hello larsb, and welcome to the Monastery!
Another approach is to modify the text of the file by using s///g to replace each repeated word with its marked version. The following script shows one way to do this (but it doesn’t take into account the maximum number of words allowed between repeats):
#! perl use strict; use warnings; my $file = do { local $/; <DATA>; }; # Slurp the whole file int +o a string # Make a hash that maps each word to its word count in the file my %words; ++$words{lc $_} for split /\W+/, $file; # Construct a regular expression to match each word which appears at l +east twice my $str = join '|', grep { $words{$_} > 1 } keys %words; my $re = qr{($str)}i; $words{$_} = 0 for keys %words; # Re-set the word counts t +o zero # Mark the second and subsequent occurrences of each word $file =~ s{$re}{ $words{lc $1}++ ? "*$1*" : $1 }eg; print $file; __DATA__ To be or not to be; that is to be the question. Is that the question? Yes!
Output:
0:13 >perl 1369_SoPW.pl To be or not *to* *be*; that is *to* *be* the question. *Is* *that* *the* *question*? Yes! 0:13 >
Hope that helps,
| Athanasius <°(((>< contra mundum | Iustus alius egestas vitae, eros Piratica, |
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Preserve original text formatting.
by Not_a_Number (Prior) on Sep 10, 2015 at 18:07 UTC | |
by Athanasius (Archbishop) on Sep 11, 2015 at 02:48 UTC | |
|
Re^2: Preserve original text formatting.
by larsb (Novice) on Sep 10, 2015 at 16:09 UTC |