-- What if the $n'th word has punctuation after it?Why should that matter? Are you asking this just because your code is sensitive to this (when it shouldn't be), or does your concept of "truncation" include the idea of "stipping final punctuation, if any"?
-- Suppose if the text contains more text between parentheses and you want to truncate before or after those, but not in the middle.Well, in this case, you're not talking about truncation; you're talking about parsing the input text and then basing your return value on some set of rules (not yet fully stated) that refer to parsed segments -- parenthesized blocks and space-separated word tokens between them -- and that respect segment boundaries.
If you were to state your intended rules, and if they were simple enough -- e.g. "when truncation leaves an open-paren with no close-paren, extend the truncation (shorten the return string) such that the open-paren is not returned" -- then maybe you can get by without actually parsing for parens. (But I doubt it would really be that simple -- e.g. what if the text is fully enclosed in a paren pair? Return the whole string? or an empty string?)
Well, let's leave parens and parsing aside for now. Why not use split?
That has the side-effect of normalizing the white-space between words to a single space per word boundary; leading white-space will be preserved, but not trailing white-space. If you'd rather preserve white-space faithfully, use a capturing split, and build the return string a little differently:sub trunc { my ( $last_word, $text ) = @_; return '' unless ( $last_word =~ /^\d+$/ and $text =~ /\S/ ); @tokens = split /\s+/, $text; return join " ", @tokens[0..$last_word-1]; }
@tokens = split /(\s+)/, $text, -1; # keep all white-space my $trunc_str = ''; for ( @tokens ) { $trunc_str .= $_; last if ( /\S/ and --$last_word == 0 ); } return $trunc_str; }
In reply to Re: Truncating real text
by graff
in thread Truncating real text
by cog
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |