in reply to Re: Extracting a substring of N chars ignoring embedded HTML
in thread Extracting a substring of N chars ignoring embedded HTML
The solution would be to add the length test to the while loop condition, or else figure a way to avoid an inner for loop, so that "last" will really finish things off. And some other nit-picks:while ( my $token = $p->get_token ) { if ($token->is_text) { if (length($token->return_text) + $total <= 200) { $doc2 .= $token->return_text; $total += length($token->return_text); } else { for (split / /, $token->return_text) { if ($total + length($_) <= 200) { $doc2 .= $_ . ' '; $total += length($_) + 1; } else { last; ## THIS ONLY EXITS THE FOR LOOP } ## So this block runs over the } ## entire remainder of the post chop($doc2) if $doc2 =~ /\s$/; } } else { $doc2 .= $token->as_is; } }
So here's my version of LTjake's while loop (not tested):
while ( my $token = $p->get_token ) { my $tkntext = $token->as_is; $tkntext =~ s/\s+/ /g; # normalize all whitespace if ($token->is_text) { if (length($tkntext) + $total <= 200) { $doc2 .= $tkntext; $total += length($tkntext); } else { my $maxlen = 200 - $total; $doc2 .= substr( $tkntext, 0, rindex( $tkntext, ' ', $maxl +en ); last; # this finishes the while loop } } else { $doc2 .= " $tkntext "; } }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: Re: Extracting a substring of N chars ignoring embedded HTML
by LTjake (Prior) on Jan 12, 2003 at 14:25 UTC |