in reply to Slicing a string on words

Would something like this be simpler?

$window_size--; my( $windowed_text )= $text_body =~ /^\s*(?:\S+\s+){$window_start}(\S+(?:(\s+\S+){0,$window_size})/;
You might need some code to handle edge cases like 0 for $window_start or $window_size. Note that I used {0,$window_size} so that asking for too big of a window just matches through the last word.

Note that my technique can also give you the same index information via @- and @+ if you have Perl v5.6 or higher.

        - tye (but my friends call me "Tye")

Replies are listed 'Best First'.
Re: (tye)Re: Slicing a string on words
by Hofmator (Curate) on Aug 28, 2001 at 21:27 UTC

    Very nice solution, tye!!

    There's a small typo in the regex (a surplus parenthesis), so here goes the code again, corrected

    $window_size--; my( $windowed_text )= $text_body =~ /^\s*(?:\S+\s+){$window_start}(\S+(?:\s+\S+){0,$window_size})/;
    What I wanted to add is that Perl handles the boundary cases very nicely, so no extra handling required for $window_start = 0 or $window_size = 0. This means
    $_ = q/01234/; /^..{0}/; # matches '0' /^..{0,0}/; # matches '0' /^..{0,-1}/; # doesn't match at all
    which is exactly what we need for the code to work fine.

    -- Hofmator

      Thanks. Note that $window_size of 0 will probably behave the same as $window_size of 1, though.

      Update: Sorry, wrong. I misunderstood your examples. /..{0,-1}/ matches any two characters followed by the literal string "{0,-1}". This could still be a problem for certain input values, but such seem pretty unlikely. So you might want some special code for the boundary case, depending on how varied your inputs might be.

              - tye (but my friends call me "Tye")
Re: (tye)Re: Slicing a string on words
by dga (Hermit) on Aug 29, 2001 at 22:09 UTC

    What if $window_start > 32767 ?

    It would seem with an average word length of say 5 that you could get more than 174,000 words per Megabyte of string input.