Unicode::UTF8 and perl Unicode compatibility

vsespb has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Unicode::UTF8 and perl Unicode compatibility by kcott (Archbishop) on Aug 31, 2013 at 12:21 UTC
G'day vsespb, The current version of Perl (5.18.1) supports Unicode 6.2. This support started with the previous version (5.18.0) - see perl5180delta. I'm not aware of a table (or similar) specifically mapping Perl versions to Unicode versions; however, the deltas listed here should document changes in Unicode support. For testing edges cases, etc. across multiple Perl version, these may prove useful: Test::More - for writing the tests. App::perlbrew - for easily installing different Perl versions and switching between them. Online Perldoc allows you to select the documentation for Perl versions 5.8.8 to 5.18.0 [at the time of writing]. Other Perl Unicode documentation that may be helpful: perluniintro - Perl Unicode introduction perlunitut - Perl Unicode tutorial perlunifaq - Perl Unicode FAQ perlunicode - Perl Unicode support perluniprops - Index of Unicode properties in Perl -- Ken	[reply]
Re: Unicode::UTF8 and perl Unicode compatibility by ww (Archbishop) on Aug 31, 2013 at 11:48 UTC
1. Start, if possible, by upgrading to a currently supported version of Perl (5.18 is current; ActiveState lags with 5.16; most others known to me are at 5.18). 2. Try it with cases you anticipate. See if the output -- to file and to console (since the rendering in those two cases may not match) -- satisfies your needs. 3. Now get all wild and crazy. Use Perl to walk Unicode::UTF8 thru the character sets with which you need to deal. Do it again and again for major versions (but note that you could go insane trying to test every build of every Perl distro, especially those built by individuals from source with variant options enabled). 4. When complete -- and it should not take long, unless you have an insanely wide selection of needed sets -- you'll know the answer to your question, rather than having to rely on second hand info of whose validity you will likely have no way (other than the above) to evaluate. 5. Then, share your newfound knowledge with your own reply to this thread. If you didn't program your executable by toggling in binary, it wasn't really programming!	[reply]
Re^2: Unicode::UTF8 and perl Unicode compatibility by vsespb (Chaplain) on Aug 31, 2013 at 12:00 UTC
Use Perl to walk Unicode::UTF8 thru the character sets with which you need to deal I am not sure how to walk. What can fail? Regexps? Character classes? String comparsion? Collations? Folder case? Normalization? I am not sure what new Unicode 6 introduced.	[reply]
Re^3: Unicode::UTF8 and perl Unicode compatibility by ww (Archbishop) on Aug 31, 2013 at 12:06 UTC
"...how to walk.": loops and arrays. "What can fail?....": Answering that is with domain of the research plan outlined above... or, stated more simply, 'TITS, try it to see.' If I've misconstrued your question or the logic needed to answer it, I offer my apologies to all those electrons which were inconvenienced by the creation of this post.	[reply]
Re^4: Unicode::UTF8 and perl Unicode compatibility by vsespb (Chaplain) on Aug 31, 2013 at 12:17 UTC
Re^5: Unicode::UTF8 and perl Unicode compatibility by ww (Archbishop) on Aug 31, 2013 at 12:36 UTC
Some notes below your chosen depth have not been shown here
Re: Unicode::UTF8 and perl Unicode compatibility by Hansen (Friar) on Sep 01, 2013 at 18:26 UTC
You will have no compatibility issues. The main difference between `Encode`'s implementation of UTF-8 and `Unicode::UTF8`'s is that `Encode` uses decoding/encoding functions provided by perl where `Unicode::UTF8` has it's own implementation. `Unicode::UTF8` provides a consistent behavior across all supported (>= 5.8.1) versions of perl. I wrote `Unicode::UTF8` because I wanted a fast implementation with a simple api, you can read a comparison with `Encode`. -- chansen	[reply] [d/l] [select]
Re^2: Unicode::UTF8 and perl Unicode compatibility by vsespb (Chaplain) on Sep 01, 2013 at 18:30 UTC
Great! Thanks!	[reply]
Re: Unicode::UTF8 and perl Unicode compatibility by Anonymous Monk on Aug 31, 2013 at 12:27 UTC
Problem that I don't really understand how it would be compatible with perl own unicode implementation Don't worry about it What will happen if some Unicode 6 characters recognized by this module, but then misinterpreted by perl? This wouldn't happen. Once octets/bytes are decoded into characters, they're characters (codepoints) possible insignificant edge case (including but not limited to security issues), for all perl 5.8.8+. But I am not sure where to start. If I were you I wouldn't even start :) Why ? Because starting is starting to sound more and more like reinventing-Unicode::UTF8, or tracking-perl-bugs-since-2006-six-decades-ago I would stick with 5.18.x	[reply]
Re^2: Unicode::UTF8 and perl Unicode compatibility by vsespb (Chaplain) on Aug 31, 2013 at 15:31 UTC
This wouldn't happen. Once octets/bytes are decoded into characters, they're characters (codepoints) So, you think, once it decoded to characters, it will work perfectly without breaking anything? What about utf8::valid(). Will it pass? Will it affect anything? What if I try encoding back to bytes with Encode::encode("UTF-8" .. ? I would stick with 5.18.x No, I specified in OP that I need compatibility with any version starting from perl5.8.8.	[reply]
Re^3: Unicode::UTF8 and perl Unicode compatibility by Anonymous Monk on Sep 01, 2013 at 09:05 UTC
:) So, you think, once it decoded to characters, it will work perfectly without breaking anything? Yes, in so much as once Unicode::UTF8 does its thing its done, your perl takes over (with all that entails) What about utf8::valid(). Will it pass? Will it affect anything? I think it will "pass" and will not "affect anything", but I don't see how it matters -- by using Unicode::UTF8 you're saying the hell with Encode.pm / utf8.pm , i'll let Unicode::UTF8 take care of everything , so there should be no reason to consult utf8 or Encode What if I try encoding back to bytes with Encode::encode("UTF-8" .. ? That ought to work fine as well (call me optimistic) No, I specified in OP that I need compatibility with any version starting from perl5.8.8. Yes, I've read this, I understand, and its why I didn't make jokes :) food for thought: Re: Why upgrade perl?, Re: perldeltas - every perl*delta in one file (pod.lst)	[reply]
Re^4: Unicode::UTF8 and perl Unicode compatibility by vsespb (Chaplain) on Sep 01, 2013 at 14:58 UTC