Hello larsb, and welcome to the Monastery!

Another approach is to modify the text of the file by using s///g to replace each repeated word with its marked version. The following script shows one way to do this (but it doesn’t take into account the maximum number of words allowed between repeats):

#! perl use strict; use warnings; my $file = do { local $/; <DATA>; }; # Slurp the whole file int +o a string # Make a hash that maps each word to its word count in the file my %words; ++$words{lc $_} for split /\W+/, $file; # Construct a regular expression to match each word which appears at l +east twice my $str = join '|', grep { $words{$_} > 1 } keys %words; my $re = qr{($str)}i; $words{$_} = 0 for keys %words; # Re-set the word counts t +o zero # Mark the second and subsequent occurrences of each word $file =~ s{$re}{ $words{lc $1}++ ? "*$1*" : $1 }eg; print $file; __DATA__ To be or not to be; that is to be the question. Is that the question? Yes!

Output:

0:13 >perl 1369_SoPW.pl To be or not *to* *be*; that is *to* *be* the question. *Is* *that* *the* *question*? Yes! 0:13 >

Hope that helps,

Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,


In reply to Re: Preserve original text formatting. by Athanasius
in thread Preserve original text formatting. by larsb

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.