in reply to Problem: how to split long words

$_ = 'ju[color=1]s[/color]t a sam[color=2]p[/color]le'; my $brackets = qr(\[[^\]]*\]); # text enclosed in brackets my $char = qr([^\[\]\s]); # not spaces or brackets s{ ( # group to $1 (?: $char # a char (?:$brackets*) # followed by any number of brackets ) {3} # 3 times ) (?:$brackets*) # followed by any number of brackets (?= # looking forward to ensure that $char # we have at least one char [^\[\]]* # and we are not inside of a brackets (?: \[ | \z ) ) } {$1-}gx; print;

Update: fixed "last '-'" issue pointed by ikegami
P.S. Concerning '[2345678901234567890' I suggest that is not valid input.

Replies are listed 'Best First'.
Re^2: Problem: how to split long words
by ikegami (Patriarch) on Aug 24, 2004 at 15:45 UTC

    Two small issues: 'ju[color=1]s[/color]t a tesss[color=2]t[/color]' will return a string ending with a '-', and '[2345678901234567890' will result in a 20 letter word.

      I noticed the end '-' thing too. I did add a section to ccn's code that handles that. Its tagged w/ the "# ADDED" comment.

      ++ btw ccn, took me awhile to wrap my head around this one.

      my $brackets = qr(\[[^\]]*\]); # text enclosed in brackets my $char = qr([^\[\]\s]); # not spaces or brackets s{ ( # group to $1 (?: $char # a char (?:$brackets*) # followed by any number of brackets ) {3} # 3 times ) (?!(?:$brackets*)(?:\s|\Z)) # ADDED: eliminates '-' on the end of wo +rds w/ a multiple of 3 chars (?= # looking forward to ensure that [^\[\]]* # we are not inside of a brackets (?: \[ | \z ) ) } {$1-}gx; print;


      -jbWare
        Thank you very much! ccn, jbware
        Your solution is great.
        I've made a one-liner from it and it looks like
        $text=~s{((?:([^\[\]\s])(?:(\[[^\]]*\])*)){3})(?!(?:(\[[^\]]*\])*)(?:\s|\Z))(?=[^\[\]]*(?:\[|\z))}{$1-}gx;
        It's so nice! Can be used in some obfuscation code... Thanks again!

      Here is non-regexp solution. It validates input in addition to the main task.

      sub splitlong { my $text = shift; my @result; my ($cnt, $inside) = (0, 0); for (split //, $text) { ($inside = 0), next if $inside and $_ eq ']'; $inside = 1 if !$inside and $_ eq '['; next if $inside; $cnt = /\s/s ? 0 : $cnt + 1; ($cnt = 0), push @result, '-' if $cnt > 3; } continue { push @result, $_; } warn "invalid input: $text" if $inside; return join '', @result; } my $text = 'ju[color=1]s[/color]t a sam[color=2]p[/color]le'; print splitlong($text);
Re^2: Problem: how to split long words
by nikos (Scribe) on Sep 01, 2004 at 14:01 UTC
    Thank your for you code.
    It works fine with a valid input i.e [tag].

    If we do not check for correct [] pairs and fix it, it doesn't work. I understand it was not the task but still... We cannot just count the number of brackets in the string to check. The check has to be more complicated checking actually tags. To open a tag use '[', to close the tag use ']' not ']]' or not '[['.

    I think a little check might help. Split the word followed by ']' if the last bracket is not '['.
    $_ = 'sss[[]sssss]sssssss'; print $_."\n"; my $brackets = qr(\[[^\]]*\]); # text enclosed in brackets my $char = qr([^\[\]\s]); # not spaces or brackets s{ ( # group to $1 (?: $char # a char (?:$brackets*) # followed by any number of brackets ) {3} # 3 times ) (?!(?:$brackets*)(?:\s|\Z)) # ADDED: eliminates '-' on the end of wo +rds w/ a multiple of 3 chars (?= # looking forward to ensure that [^\[\]]* # we are not inside of a brackets (?: \[ | \z ) ) } {$1-}gx; print;
    Thank you.