(tye)Re: Slicing a string on words

Would something like this be simpler?

$window_size--;
my( $windowed_text )= $text_body =~
    /^\s*(?:\S+\s+){$window_start}(\S+(?:(\s+\S+){0,$window_size})/;
[download]

You might need some code to handle edge cases like 0 for $window_start or $window_size. Note that I used {0,$window_size} so that asking for too big of a window just matches through the last word.

Note that my technique can also give you the same index information via @- and @+ if you have Perl v5.6 or higher.

- tye (but my friends call me "Tye")

Comment on (tye)Re: Slicing a string on words Select or Download Code

Replies are listed 'Best First'.
Re: (tye)Re: Slicing a string on words by Hofmator (Curate) on Aug 28, 2001 at 21:27 UTC
Very nice solution, tye!! There's a small typo in the regex (a surplus parenthesis), so here goes the code again, corrected `$window_size--; my( $windowed_text )= $text_body =~ /^\s*(?:\S+\s+){$window_start}(\S+(?:\s+\S+){0,$window_size})/;` [download] What I wanted to add is that Perl handles the boundary cases very nicely, so no extra handling required for $window_start = 0 or $window_size = 0. This means `$_ = q/01234/; /^..{0}/; # matches '0' /^..{0,0}/; # matches '0' /^..{0,-1}/; # doesn't match at all` [download] which is exactly what we need for the code to work fine. -- Hofmator	[reply] [d/l] [select]
(tye)Re2: Slicing a string on words by tye (Sage) on Aug 28, 2001 at 21:31 UTC
Thanks. Note that $window_size of 0 will probably behave the same as $window_size of 1, though. Update: Sorry, wrong. I misunderstood your examples. /..{0,-1}/ matches any two characters followed by the literal string "{0,-1}". This could still be a problem for certain input values, but such seem pretty unlikely. So you might want some special code for the boundary case, depending on how varied your inputs might be. - tye (but my friends call me "Tye")	[reply]
Re: (tye)Re: Slicing a string on words by dga (Hermit) on Aug 29, 2001 at 22:09 UTC
What if $window_start > 32767 ? It would seem with an average word length of say 5 that you could get more than 174,000 words per Megabyte of string input.	[reply]