OT, but this may be an example (trivial, but valid) of violating of the proposition "algorithm first; then code."

Valid, that is, at least for cases where the script above may be used to test narrative documents for inadvertent repeats or typos (clearly, that's not the only use, tho the use of the overarching :punct: class seems to lean that way).

The substitution on :punct: presents issues in a couple edge cases.

Suppose, for a strained example, an individual named "Joe Williams" were the author of a tome on various Williams (eg, Wm. Gates, William of Ghent, Fred Williams), which tome is named "Williams' Williams."

Less exotically, suppose a line with fragments of two sentences: "...blab, blah," Foo said to Boo. Boo sneered in reply..."

Stripping the punctuation makes the book title satisfy the repeated_word_criteria even tho the original text did not. Similarly, the dialogue example (assuming the lack of a paragraph break as illustrated) would lead the script reporting "Boo" had been duped. Yet adjacent duplication from one line to another would NOT report duplication.

In short (because the habits induced by the way [Pp]erl makes writing hasty or one-off scripts easy, I find I have to remind myself often): "algorithm first; then code."

In reply to Re: removing repeats by ww
in thread removing repeats by dummy2

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.