in reply to Re: regexp word break help
in thread regexp word break help

The other problem is your use of \w* in your substitution. By making that word character optional you aren't taking into consideration one letter words like 'a' or 'I'.

UPDATE: thelenm is correct in saying that requiring a \w character will fail to trim whitespace where the 45th character is a space and the 46th a word char. Turns out I can fix my own solution above by not requiring the word char. Thanks thelenm ;0).




Amel

Replies are listed 'Best First'.
Re: Re: Re: regexp word break help
by thelenm (Vicar) on Apr 30, 2002 at 21:51 UTC
    But requiring a word character (\w+) will fail to trim off spaces in the case where the 45th character is a space character and the 46th character is a word character. I think my solution works correctly... can you give an example where it doesn't? Here are some boundary cases that work as they should, using one-character words as you suggested:
    @words = ( # 1 2 3 4 5 #12345678901234567890123456789012345678901234567890 "The quick brown fox jumped over the lazy d I own", "The quick brown fox jumped over the lazy do I own", "The quick brown fox jumped over the lazy dog I own", "The quick brown fox jumped over the lazy dogs I own", "The quick brown fox jumped over the lazy doggy I own", ); for my $word (@words) { if (length $word > 45) { my $forty_sixth = substr($word, 45, 1); $word = substr($word, 0, 45); $word =~ s/\s*\w*\z// if $forty_sixth =~ /\w/; } print "Word: '$word', Length: ", length $word, "\n"; }
    produces:
    Word: 'The quick brown fox jumped over the lazy d I', Length: 44 Word: 'The quick brown fox jumped over the lazy do I', Length: 45 Word: 'The quick brown fox jumped over the lazy dog', Length: 44 Word: 'The quick brown fox jumped over the lazy dogs', Length: 45 Word: 'The quick brown fox jumped over the lazy', Length: 40