That simple little algorithm is hardly even close to a reasonable solution. There are too many cases it ignores when dealing with what is known as a "sentence" in the English language. There are many special cases involved. Solving this using a \s+ followed by a single upper case letter is wrong wrong wrong! For a fast fix to your problem I would suggest using th Lingua::EN::Sentence module. It has most cases covered but you would be amazed at how much it can fail. For small sets of data it should be more than adequate. One of the best ways is to write a statistical parser using bayes theorem to "guess" if the end of a sentence has been reached. The downside to this method is that you have to make a "training set" so that it can build a statistical model to work on. The previous algorithm for the following input
This is a test. Am I testing this right? What if a proper name like +John A. Smith is entered? Wow that is crazy! On Apr. 18 I ran this +to see if it worked. What if I try A vs. B or a vs. b? Is it going to work? What if I tal +k about the U.S.S.R. or the U.S.A.? "I like to speak like this. It m +akes me laugh." said the funny man.
Will output
-This is a test- -Am I testing this right? What if a proper name like John A- -Smith is entered? Wow that is crazy! On Apr- -18 I ran this to see if it worked- -What if I try A vs- -B or a vs- -b? Is it going to work? What if I talk about the U.S.S.R- -or the U.S.A.? "I like to speak like this- -It makes me laugh." said the funny man.-
Notice how often it fails for "simple" sentences...

In reply to Re^2: sentence-safe chop heuristics? by Grundle
in thread sentence-safe chop heuristics? by foomatic99

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.