-- What if the $n'th word has punctuation after it?
Why should that matter? Are you asking this just because your code is sensitive to this (when it shouldn't be), or does your concept of "truncation" include the idea of "stipping final punctuation, if any"?
-- Suppose if the text contains more text between parentheses and you want to truncate before or after those, but not in the middle.
Well, in this case, you're not talking about truncation; you're talking about parsing the input text and then basing your return value on some set of rules (not yet fully stated) that refer to parsed segments -- parenthesized blocks and space-separated word tokens between them -- and that respect segment boundaries.

If you were to state your intended rules, and if they were simple enough -- e.g. "when truncation leaves an open-paren with no close-paren, extend the truncation (shorten the return string) such that the open-paren is not returned" -- then maybe you can get by without actually parsing for parens. (But I doubt it would really be that simple -- e.g. what if the text is fully enclosed in a paren pair? Return the whole string? or an empty string?)

Well, let's leave parens and parsing aside for now. Why not use split?

sub trunc { my ( $last_word, $text ) = @_; return '' unless ( $last_word =~ /^\d+$/ and $text =~ /\S/ ); @tokens = split /\s+/, $text; return join " ", @tokens[0..$last_word-1]; }
That has the side-effect of normalizing the white-space between words to a single space per word boundary; leading white-space will be preserved, but not trailing white-space. If you'd rather preserve white-space faithfully, use a capturing split, and build the return string a little differently:
@tokens = split /(\s+)/, $text, -1; # keep all white-space my $trunc_str = ''; for ( @tokens ) { $trunc_str .= $_; last if ( /\S/ and --$last_word == 0 ); } return $trunc_str; }

In reply to Re: Truncating real text by graff
in thread Truncating real text by cog

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.