webchalkboard has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

My questions are getting simplier but atleast that makes them easier to answer :)

I just have a string which I need to split into words, I could use the substr function to do this but I don't want half a word.

So if I have the string 'This is a test message' I need to make sure it is not longer than 12 characters, but not chop off in the middle of a word, so this would be 'This is a'.

I have looked at Text::Wrap but I think this might be overkill, I know there is a regexp for this, anyone care to remind me?

Thanks, Tom

Learning without thought is labor lost; thought without learning is perilous. - Confucius
WebChalkboard.com | For the love of art...

Replies are listed 'Best First'.
Re: Splitting a string into words
by tye (Sage) on Aug 11, 2005 at 18:01 UTC

    Fun. Some edge cases got missed several times. Strings under 12 characters and a first word of over 12 characters were the most common, I think. Though what to do when the first word is over 12 characters is not clear from the problem statement. Strict interpretation of the problem statement would end up with an empty string for that case. More likely, the first 12 characters of the first word is appropriate.

    But I think you can still do this and keep it pretty simple:

    my( $first )= $message =~ /^\s*(.{1,12}\s|.{0,12})/;

    Many ways to do this. For such a tiny, simple operation, I could envision a half-dozen tests in the UT suite off the top of my head. (:

    Note that I don't collapse internal spaces, which is a nice touch in Transient's solution.

    - tye        

Re: Splitting a string into words
by pbeckingham (Parson) on Aug 11, 2005 at 17:00 UTC

    This will extract anything up to 12 characters (greedy) that is followed by whitespace. It does what you described, but needs work to be of more general use.

    #! /usr/bin/perl use strict; use warnings; my $string = 'this is a test message'; my ($chunk) = $string =~ /(.{1,12})\s/; print $chunk, "\n";



    pbeckingham - typist, perishable vertebrate.

      (?:\s|$) works better than \s.

      An other alternative is \b, which will chop after a word, but before punctuation.

      Depends on what a word is. If a word is \w+, this variant might be of use: ((\s|\w){0,11})(\s|$)
Re: Splitting a string into words
by Transient (Hermit) on Aug 11, 2005 at 17:08 UTC
    Here's my try at it:
    #!/usr/bin/perl my $max_str_len = 12; my $string = "This is a test message"; $string =~ tr/ / /s; $string =~ s/^\s*//; $string =~ /(.{0,$max_str_len})(\s|$)/; my $trunc_string = $1; my @words = split /\s+/, $trunc_string; print "Word: ", $_, "\n" foreach @words;
Re: Splitting a string into words
by chester (Hermit) on Aug 11, 2005 at 17:22 UTC
    Text::Wrap isn't necessarily overkill. What about words that are longer than 12 characters? The answers given so far don't handle that, so far as I can tell. Text::Wrap isn't the greatest, but at least doesn't lose any letters.

    This code is also a bit more readable, in my opinion.

    use warnings; use strict; use Text::Wrap qw(wrap); my $phrase = 'This is a test message with lots of words in it. Tintina +bulations!'; $Text::Wrap::columns = 13; my $wrapped = wrap('','',$phrase); print $wrapped;
Re: Splitting a string into words
by cog (Parson) on Aug 11, 2005 at 16:51 UTC

      Thanks for the link, i've had a look at the module though and i'm not sure it does what I need it to... i'm just installing it now and will have a go and see if I can get something to work.

      Learning without thought is labor lost; thought without learning is perilous. - Confucius
      WebChalkboard.com | For the love of art...
Re: Splitting a string into words
by borisz (Canon) on Aug 11, 2005 at 17:25 UTC
    Here is my favorite:
    my $string = 'this is a test message'; my ($chunk) = $string =~ /(.{1,12})\b/;
    Boris
Re: Splitting a string into words
by ysth (Canon) on Aug 11, 2005 at 17:00 UTC
    Something like:
    $msg = "This is a test message"; @lines = $msg =~ /(?>\s*)(.{0,11}\S)(?!\S)/g;
    Update: this strips leading characters if any \S+ "word" is longer than 12.
Re: Splitting a string into words
by tphyahoo (Vicar) on Aug 12, 2005 at 08:41 UTC
    The "natural" way to do it:

    No more than 12 characters: ^.{0,12}

    Words: ^(\w+(\s|$))*

    Both must be true:

    HOW?

    Is there maybe some way to do this with a perl6 rule, or Parse::Recdescent?