Please read How do I post a question effectively? In particular, note that you should be providing desired output as well as some code that didn't work for you. I honestly have no idea what you mean by "splitting based on number of words, sentences or letters". If you can't write it in code, write it in pseudo-code and be explicit about your algorithm. The more specificity you can provide, the more inclined people will be to help and the better the help will be.

The general challenge you describe is not easily solved, since English is chock full of idioms and peculiarities. Given the assigned spec, I would probably split on one or more whitespace characters that are preceded by periods, question marks or exclamation points but not preceded by a title (Mr., Dr., Mrs., Ms., esq., ...). This is by no means comprehensive, but it should get you through this task. Read perlreftut and see if you can translate the above spec into a regular expression. Of particular interest should be Looking ahead and looking behind. Alternatively, you could just simply split with /\.\s+/ and then stitch entries back together if there's a trailing title.

How do you take paragraph or large amount of text and break it into sentences (perferably using Ruby)...
I think perhaps you've come to the wrong community. You should stay anyway, though, since we're pretty cool and generally helpful.

#11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.


In reply to Re: Split a paragraph based on the number of letters by kennethk
in thread Split a paragraph based on the number of letters by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.