in reply to regexp word break help

It's not particularly pretty, but here's a solution using substr and a regular expression:
if (length $word > 45) { my $forty_sixth = substr($word, 45, 1); $word = substr($word, 0, 45); $word =~ s/\s*\w*\z// if $forty_sixth =~ /\w/; }

Update: For some reason I was thinking about this while trying to get to sleep last night. My solution will fail to trim whitespace in the case where the 45th and 46th chars are both whitespace. Also, I should know better than to write a substitution expression that can match the empty string. Here's a revised version that should work correctly:

if (length $word > 45) { my $forty_sixth = substr($word, 45, 1); $word = substr($word, 0, 45); $word =~ s/\w+\z// if $forty_sixth =~ /^\w/; $word =~ s/\s+\z//; }

Replies are listed 'Best First'.
Re: Re: regexp word break help
by dsb (Chaplain) on Apr 30, 2002 at 21:33 UTC
    The other problem is your use of \w* in your substitution. By making that word character optional you aren't taking into consideration one letter words like 'a' or 'I'.

    UPDATE: thelenm is correct in saying that requiring a \w character will fail to trim whitespace where the 45th character is a space and the 46th a word char. Turns out I can fix my own solution above by not requiring the word char. Thanks thelenm ;0).




    Amel
      But requiring a word character (\w+) will fail to trim off spaces in the case where the 45th character is a space character and the 46th character is a word character. I think my solution works correctly... can you give an example where it doesn't? Here are some boundary cases that work as they should, using one-character words as you suggested:
      @words = ( # 1 2 3 4 5 #12345678901234567890123456789012345678901234567890 "The quick brown fox jumped over the lazy d I own", "The quick brown fox jumped over the lazy do I own", "The quick brown fox jumped over the lazy dog I own", "The quick brown fox jumped over the lazy dogs I own", "The quick brown fox jumped over the lazy doggy I own", ); for my $word (@words) { if (length $word > 45) { my $forty_sixth = substr($word, 45, 1); $word = substr($word, 0, 45); $word =~ s/\s*\w*\z// if $forty_sixth =~ /\w/; } print "Word: '$word', Length: ", length $word, "\n"; }
      produces:
      Word: 'The quick brown fox jumped over the lazy d I', Length: 44 Word: 'The quick brown fox jumped over the lazy do I', Length: 45 Word: 'The quick brown fox jumped over the lazy dog', Length: 44 Word: 'The quick brown fox jumped over the lazy dogs', Length: 45 Word: 'The quick brown fox jumped over the lazy', Length: 40