Hello Monks,

I am a novice in perl. I am trying to split a data into a summary based on the number of words(or sentences or letter).

$data = "How do you take paragraph or large amount of text and break i +t into sentences (perferably using Ruby) taking into account cases su +ch as Mr. and Dr. and U.S.A? (Assuming you just put the sentences int +o an array of arrays) UPDATE: One possible solution I thought of involves using a parts-of-s +peech tagger (POST) and a classifier to determine the end of a senten +ce: Getting data from Mr. Jones felt the warm sun on his face as he steppe +d out onto the balcony of his summer home in Italy. He was happy to b +e alive.";

I want to show the summary in another variable $data_summary, so,

$data_summary = "How do you take paragraph or large amount of text and break it into sentences (perferably using Ruby) taking into account cases such as Mr. and Dr. and U.S.A? (Assuming you just put the sentences into..."

Can anyone help me in getting the $data_summary as above by splitting based on number of words, sentences or letters (I prefer based on number of letters).

Thank you in advance.


In reply to Split a paragraph based on the number of letters by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.