Problem: how to split long words

nikos has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Problem: how to split long words by ccn (Vicar) on Aug 24, 2004 at 14:56 UTC
`$_ = 'ju[color=1]s[/color]t a sam[color=2]p[/color]le'; my $brackets = qr(\[[^\]]\]); # text enclosed in brackets my $char = qr([^\[\]\s]); # not spaces or brackets s{ ( # group to $1 (?: $char # a char (?:$brackets) # followed by any number of brackets ) {3} # 3 times ) (?:$brackets) # followed by any number of brackets (?= # looking forward to ensure that $char # we have at least one char [^\[\]] # and we are not inside of a brackets (?: \[ \| \z ) ) } {$1-}gx; print;` [download] Update: fixed "last '-'" issue pointed by ikegami P.S. Concerning `'[2345678901234567890'` I suggest that is not valid input.	[reply] [d/l] [select]
Re^2: Problem: how to split long words by ikegami (Patriarch) on Aug 24, 2004 at 15:45 UTC
Two small issues: `'ju[color=1]s[/color]t a tesss[color=2]t[/color]'` will return a string ending with a '-', and `'[2345678901234567890'` will result in a 20 letter word.	[reply] [d/l] [select]
Re^3: Problem: how to split long words by jbware (Chaplain) on Aug 24, 2004 at 15:55 UTC
I noticed the end '-' thing too. I did add a section to ccn's code that handles that. Its tagged w/ the "# ADDED" comment. ++ btw ccn, took me awhile to wrap my head around this one. `my $brackets = qr(\[[^\]]\]); # text enclosed in brackets my $char = qr([^\[\]\s]); # not spaces or brackets s{ ( # group to $1 (?: $char # a char (?:$brackets) # followed by any number of brackets ) {3} # 3 times ) (?!(?:$brackets)(?:\s\|\Z)) # ADDED: eliminates '-' on the end of wo +rds w/ a multiple of 3 chars (?= # looking forward to ensure that [^\[\]] # we are not inside of a brackets (?: \[ \| \z ) ) } {$1-}gx; print;` [download] -jbWare	[reply] [d/l]
Re^4: Problem: how to split long words by nikos (Scribe) on Aug 25, 2004 at 07:14 UTC
Re^3: Problem: how to split long words by ccn (Vicar) on Aug 24, 2004 at 19:42 UTC
Here is non-regexp solution. It validates input in addition to the main task. `sub splitlong { my $text = shift; my @result; my ($cnt, $inside) = (0, 0); for (split //, $text) { ($inside = 0), next if $inside and $_ eq ']'; $inside = 1 if !$inside and $_ eq '['; next if $inside; $cnt = /\s/s ? 0 : $cnt + 1; ($cnt = 0), push @result, '-' if $cnt > 3; } continue { push @result, $_; } warn "invalid input: $text" if $inside; return join '', @result; } my $text = 'ju[color=1]s[/color]t a sam[color=2]p[/color]le'; print splitlong($text);` [download]	[reply] [d/l]
Re^4: Problem: how to split long words by nikos (Scribe) on Sep 01, 2004 at 14:09 UTC
Re^2: Problem: how to split long words by nikos (Scribe) on Sep 01, 2004 at 14:01 UTC
Thank your for you code. It works fine with a valid input i.e `[tag]`. If we do not check for correct `[]` pairs and fix it, it doesn't work. I understand it was not the task but still... We cannot just count the number of brackets in the string to check. The check has to be more complicated checking actually tags. To open a tag use `'['`, to close the tag use `']'` not `']]'` or not `'[['`. I think a little check might help. Split the word followed by `']'` if the last bracket is not `'['`. `$_ = 'sss[[]sssss]sssssss'; print $_."\n"; my $brackets = qr(\[[^\]]\]); # text enclosed in brackets my $char = qr([^\[\]\s]); # not spaces or brackets s{ ( # group to $1 (?: $char # a char (?:$brackets) # followed by any number of brackets ) {3} # 3 times ) (?!(?:$brackets)(?:\s\|\Z)) # ADDED: eliminates '-' on the end of wo +rds w/ a multiple of 3 chars (?= # looking forward to ensure that [^\[\]] # we are not inside of a brackets (?: \[ \| \z ) ) } {$1-}gx; print;` [download] Thank you.	[reply] [d/l] [select]
Re: Problem: how to split long words by ikegami (Patriarch) on Aug 24, 2004 at 14:48 UTC
So far, I have: `$text = 'ju[color=1]s[/color]t a tes[color=2]t[/color]'; $text =~ s/ ( (?: (?: \[ [^]]* \] )* [^ [] ){3} (?: \[ [^]]* \] )* ) (?=[^ ]) /$1-/gx; print($text, "\n"); # ju[color=1]s[/color]-t a tes[color=2]-t[/co-lor-]` [download] It's not quite there yet, but I have to go.	[reply] [d/l]
Re: Problem: how to split long words by nikos (Scribe) on Sep 01, 2004 at 14:13 UTC
I think this problem and it's solutions should be copied to Q&A when we have everything said and done. It will be a 'must-have' solution for everyone who is doing some kind of web-chats or discussion boards.	[reply]
Re: Problem: how to split long words by nikos (Scribe) on Sep 07, 2004 at 13:47 UTC
The following fix for the code provided by ccn. Should be `($cnt = 1), push @result, '-' if $cnt > 3;` not `($cnt = 0), push @result, '-' if $cnt > 3;` If `$cnt = 0` then the 2nd, 3rd... group of characters in each word will be (N+1) long. In the following example the first group will be 3 characters long and all the rest groups will be 4 characters long `sub splitlong { my $text = shift; my @result; my ($cnt, $inside) = (0, 0); for (split //, $text) { ($inside = 0), next if $inside and $_ eq ']'; $inside = 1 if !$inside and $_ eq '['; next if $inside; $cnt = /\s/s ? 0 : $cnt + 1; ($cnt = 1), push @result, '-' if $cnt > 3; } continue { push @result, $_; } warn "invalid input: $text" if $inside; return join '', @result; } my $text = 'ju[color=1]s[/color]t a sam[color=2]p[/color]le'; print splitlong($text);` [download]	[reply] [d/l] [select]