use warnings; use strict; my $data; while (my $line = ){ chomp $line; $data .= " $line"; $data =~ s/ +/ /g; if ($data =~ m/(.+?[\?\.\!]('|")?\s)(?=\p{Upper}|\p{Punct})/){ my $sentence = $1; print $sentence,"\n\n"; substr $data, 0 ,length $sentence, ''; } } print $data if $data; __DATA__ foomatic99 has asked for the wisdom of the Perl Monks concerning the following question: I am looking for a way to chop text at sentence boundaries. I realize that somebody out there must have come up with some heuristics for doing this, though I can't think of any unambiguous terms to search for something like this. I realize that nothing in a reasonably light-weight implementation is going to get it right 100% of the time, but at least I should be able to find something better than just cutting at a certain number of bytes. The text is English, utf8... possibly with HTML entity references. Can anybody help me out here?