in reply to Re: Find a good starting section of a long text
in thread Find a good starting section of a long text

One problem I've had with this regexp (in my case, trying to break off the first sentance) is cases where abbreviations, i.e. M(r|rs)., Corp., Inc., etc. are around but are not a good place to split a body of text for something like inserting an advert. I handle it by making sure the word is at least 5 chars first.
($first,$rest) = $body =~ /(.*?\w{5,}\.)(.*)/;

Replies are listed 'Best First'.
Re^3: Find a good starting section of a long text
by Anonymous Monk on Aug 13, 2004 at 01:35 UTC
    For determining sentence breaks, you might take a look at Lingua::EN::Sentence, which tries to be intelligent about abbreviations and such.