Re: Slicing a string on words

It seems you want a window x words big?

@words=$text_body =~ /(\S+)/g;
#compute $start and $end
$newtext=join(" ",@words[$start..$end]);
[download]

Update:Without copying. Note: I could not get the foreach deal to work at all so had to use while with a counter.

# $start and $window

$i++ while($i<$start && /(\S+\s+)/g);

$i=0;
while(/\G(\S+\s+)/g && $i<$window_length)
{
    $newtext .= "$1";
    $i++;
}
[download]

Note also: Two types of while. Probably should pick one you like the best and standardize on that.

This skips the $start number of words and the assigns $window_length words to $newtext preserving whitespace

Another Update: $start of 0 was not working but reversing the tests in the first while fixes that.

Comment on Re: Slicing a string on words Select or Download Code

Replies are listed 'Best First'.
Re: Re: Slicing a string on words by dstar (Scribe) on Aug 28, 2001 at 02:39 UTC
Ah, left out a relevant bit of info: I need to preserve whitespace. So what I really need is the index of the end of the $window_start'th word, and the index of the ($window_start + $window_size)'th word. And these are potentially 1 meg strings, so I'd like to avoid copies if possible.	[reply]
Re: Re: Slicing a string on words by dstar (Scribe) on Aug 29, 2001 at 20:31 UTC
Doesn't seem to work: Given $window_start of 0, $window_size of 100, and $text_body of 'Testing news submission.', it gives 'news'. It also seems to make....wait. Ok, I know where Testing is going and can fix that. It's not working on teh last word because there's no whitespace after it. Would changing \s+ to \s* work?	[reply]
Re: Re: Re: Slicing a string on words by dga (Hermit) on Aug 29, 2001 at 22:02 UTC
I think it would be safest (and safe to assume) that a 1 Meg string will be stored in a file and terminated with some sort of CR and or LF and in fact the pattern match I have would require that to be the case. The pattern would get a lot more complex if you could end a string with a \S type of entity. However you have noticed a problem with the original code with a start of 0. I will update that bit.	[reply]