in reply to Slicing a string on words

It seems you want a window x words big?

@words=$text_body =~ /(\S+)/g; #compute $start and $end $newtext=join(" ",@words[$start..$end]);

Update:Without copying. Note: I could not get the foreach deal to work at all so had to use while with a counter.

# $start and $window $i++ while($i<$start && /(\S+\s+)/g); $i=0; while(/\G(\S+\s+)/g && $i<$window_length) { $newtext .= "$1"; $i++; }

Note also: Two types of while. Probably should pick one you like the best and standardize on that.

This skips the $start number of words and the assigns $window_length words to $newtext preserving whitespace

Another Update: $start of 0 was not working but reversing the tests in the first while fixes that.

Replies are listed 'Best First'.
Re: Re: Slicing a string on words
by dstar (Scribe) on Aug 28, 2001 at 02:39 UTC
    Ah, left out a relevant bit of info: I need to preserve whitespace. So what I *really* need is the index of the end of the $window_start'th word, and the index of the ($window_start + $window_size)'th word.

    And these are potentially 1 meg strings, so I'd like to avoid copies if possible.

Re: Re: Slicing a string on words
by dstar (Scribe) on Aug 29, 2001 at 20:31 UTC
    Doesn't seem to work: Given $window_start of 0, $window_size of 100, and $text_body of 'Testing news submission.', it gives 'news'.

    It also seems to make....wait. Ok, I know where Testing is going and can fix that. It's not working on teh last word because there's no whitespace after it. Would changing \s+ to \s* work?

      I think it would be safest (and safe to assume) that a 1 Meg string will be stored in a file and terminated with some sort of CR and or LF and in fact the pattern match I have would require that to be the case. The pattern would get a lot more complex if you could end a string with a \S type of entity.

      However you have noticed a problem with the original code with a start of 0. I will update that bit.