Re: Unicode::UTF8 and perl Unicode compatibility
by kcott (Archbishop) on Aug 31, 2013 at 12:21 UTC
|
G'day vsespb,
The current version of Perl (5.18.1) supports Unicode 6.2.
This support started with the previous version (5.18.0) - see perl5180delta.
I'm not aware of a table (or similar) specifically mapping Perl versions to Unicode versions; however, the deltas listed here should document changes in Unicode support.
For testing edges cases, etc. across multiple Perl version, these may prove useful:
-
Test::More - for writing the tests.
-
App::perlbrew - for easily installing different Perl versions and switching between them.
-
Online Perldoc allows you to select the documentation for Perl versions 5.8.8 to 5.18.0 [at the time of writing].
Other Perl Unicode documentation that may be helpful:
| [reply] |
Re: Unicode::UTF8 and perl Unicode compatibility
by ww (Archbishop) on Aug 31, 2013 at 11:48 UTC
|
1. Start, if possible, by upgrading to a currently supported version of Perl (5.18 is current; ActiveState lags with 5.16; most others known to me are at 5.18).
2. Try it with cases you anticipate. See if the output -- to file and to console (since the rendering in those two cases may not match) -- satisfies your needs.
3. Now get all wild and crazy. Use Perl to walk Unicode::UTF8 thru the character sets with which you need to deal. Do it again and again for major versions (but note that you could go insane trying to test every build of every Perl distro, especially those built by individuals from source with variant options enabled).
4. When complete -- and it should not take long, unless you have an insanely wide selection of needed sets -- you'll know the answer to your question, rather than having to rely on second hand info of whose validity you will likely have no way (other than the above) to evaluate.
5. Then, share your newfound knowledge with your own reply to this thread.
If you didn't program your executable by toggling in binary, it wasn't really programming!
| [reply] |
|
|
Use Perl to walk Unicode::UTF8 thru the character sets with which you need to deal
I am not sure how to walk. What can fail? Regexps? Character classes? String comparsion? Collations? Folder case? Normalization? I am not sure what new Unicode 6 introduced.
| [reply] |
|
|
"...how to walk.": loops and arrays.
"What can fail?....": Answering that is with domain of the research plan outlined above... or, stated more simply, 'TITS, try it to see.'
If I've misconstrued your question or the logic needed to answer it, I offer my apologies to all those electrons which were inconvenienced by the creation of this post.
| [reply] |
|
|
|
|
|
Re: Unicode::UTF8 and perl Unicode compatibility
by Hansen (Friar) on Sep 01, 2013 at 18:26 UTC
|
You will have no compatibility issues. The main difference between Encode's implementation of UTF-8 and Unicode::UTF8's is that Encode uses decoding/encoding functions provided by perl where Unicode::UTF8 has it's own implementation. Unicode::UTF8 provides a consistent behavior across all supported (>= 5.8.1) versions of perl.
I wrote Unicode::UTF8 because I wanted a fast implementation with a simple api, you can read a comparison with Encode.
--
chansen
| [reply] [d/l] [select] |
|
|
| [reply] |
Re: Unicode::UTF8 and perl Unicode compatibility
by Anonymous Monk on Aug 31, 2013 at 12:27 UTC
|
Problem that I don't really understand how it would be compatible with perl own unicode implementation Don't worry about it
What will happen if some Unicode 6 characters recognized by this module, but then misinterpreted by perl? This wouldn't happen. Once octets/bytes are decoded into characters, they're characters (codepoints)
possible insignificant edge case (including but not limited to security issues), for all perl 5.8.8+. But I am not sure where to start. If I were you I wouldn't even start :)
Why ? Because starting is starting to sound more and more like reinventing-Unicode::UTF8, or tracking-perl-bugs-since-2006-six-decades-ago
I would stick with 5.18.x
| [reply] |
|
|
This wouldn't happen. Once octets/bytes are decoded into characters, they're characters (codepoints)
So, you think, once it decoded to characters, it will work perfectly without breaking anything?
What about utf8::valid(). Will it pass? Will it affect anything?
What if I try encoding back to bytes with Encode::encode("UTF-8" .. ?
I would stick with 5.18.x
No, I specified in OP that I need compatibility with any version starting from perl5.8.8.
| [reply] |
|
|
:)
So, you think, once it decoded to characters, it will work perfectly without breaking anything?
Yes, in so much as once Unicode::UTF8 does its thing its done, your perl takes over (with all that entails)
What about utf8::valid(). Will it pass? Will it affect anything?
I think it will "pass" and will not "affect anything", but I don't see how it matters -- by using Unicode::UTF8 you're saying the hell with Encode.pm / utf8.pm , i'll let Unicode::UTF8 take care of everything , so there should be no reason to consult utf8 or Encode
What if I try encoding back to bytes with Encode::encode("UTF-8" .. ?
That ought to work fine as well (call me optimistic)
No, I specified in OP that I need compatibility with any version starting from perl5.8.8. Yes, I've read this, I understand, and its why I didn't make jokes :) food for thought: Re: Why upgrade perl?, Re: perldeltas - every perl*delta in one file (pod.lst)
| [reply] |
|
|