Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re: Re: Re: s/.// increases length - bug or badly documented feature

by Biker (Priest)
on Mar 01, 2002 at 09:57 UTC ( [id://148532]=note: print w/replies, xml ) Need Help??


in reply to Re: Re: s/.// increases length - bug or badly documented feature
in thread s/.// increases length - bug or badly documented feature

I'm reading from 'perlunicode' here, in my Perl V5.6.0 documentation:

Important Caveat

WARNING: The implementation of Unicode support in Perl is incomplete.

The following areas need further work.

Input and Output Disciplines
There is currently no easy way to mark data read from a file or other external source as being utf8. This will be one of the major areas of focus in the near future.

Regular Expressions
The existing regular expression compiler does not produce polymorphic opcodes. This means that the determination on whether to match Unicode characters is made when the pattern is compiled, based on whether the pattern contains Unicode characters, and not when the matching happens at run time. This needs to be changed to adaptively match Unicode if the string to be matched is Unicode.

use utf8 still needed to enable a few features
The utf8 pragma implements the tables used for Unicode support. These tables are automatically loaded on demand, so the utf8 pragma need not normally be used.
However, as a compatibility measure, this pragma must be explicitly used to enable recognition of UTF-8 encoded literals and identifiers in the source text.


"a little extra white space can make your code a lot easier to read"
Heh! I thought that was easy to read. I explicitly made the code snippet 'readable'. I even put in parentheses around the strings to be printed.
Anyway, TMTOWTDI and "style" is just a question of... Well, style. ;-)


Everything will go worng!

  • Comment on Re: Re: Re: s/.// increases length - bug or badly documented feature

Replies are listed 'Best First'.
Re: Re: Re: Re: s/.// increases length - bug or badly documented feature
by Juerd (Abbot) on Mar 01, 2002 at 19:32 UTC
    There is currently no easy way to mark data read from a file or other external source as being utf8.

    So adding broken unicode-support in a way rendered Perl unusable for external string input. Great! Now we have realy great and fast programming language that can handle text very well, but not if the text has unicode and the utf8 pragma has not been used.

    Is the moral of this story: "don't just always use strict, always use utf8 too"?

    sub byte_length { # depends on bugs no utf8; my ($string) = @_; my $counter; $counter++ while $string =~ s/.//s; return $counter; } sub has_multibytes { my ($string) = @_; return length($string) != byte_length($string); }
    Alternatives for these subs are welcome, of course.

    Lbh ebgngrq guvf grkg naq abj lbh pna ernq vg. Fb jung? :) -- Whreq

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://148532]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (4)
As of 2024-04-19 23:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found