I am looking for a way to chop text at sentence boundaries. I realize that somebody out there must have come up with some heuristics for doing this, though I can't think of any unambiguous terms to search for something like this.
I realize that nothing in a reasonably light-weight implementation is going to get it right 100% of the time, but at least I should be able to find something better than just cutting at a certain number of bytes.
The text is English, utf8... possibly with HTML entity references.