Re: making a regex work with Unicode

As a first step, make sure Perl is reading the files as UTF-8, see the -C switch in perlrun:

$ perl -CSD -pe's/(.{0,60})\b/$1\n/g' FILE
[download]

If that isn't enough, perhaps using the \X "Unicode extended grapheme cluster" and/or \b{wb} "Unicode Word Boundary" will further help - I've never used them so far, but a quick test on a simple file worked for me:

$ perl -CSD -pe's/(\X{0,60})\b{wb}/$1\n/g' FILE
[download]

(As I was writing this, poj wrote about Text::Wrap, which also appears to have Unicode support, so that might be easiest.)

Comment on Re: making a regex work with Unicode Select or Download Code