Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

My point is that when you encode strings to formats other than Perl's internal utf8 format, you end up with a scalar without the utf flag set, in which multiple bytes are used to represent single characters. In some of these encodings, some individual bytes of the multi-byte characters can be null. Eg. As you pointed out the UTF-16 variants.

These encoded values often need to be passed to system apis. At that point, warning or dieing because the scalar contains embedded nulls would be wrong and if Perl was to try and implement such warnings or traps for embedded null bytes, it would need to detect the above situations.

Example. The often questioned, but (to my knowledge), yet to be resolved problem of globbing the windows filesystem for files with unicode filenames. Win32 has (for a long time now), a very functional set of APIs for dealing with Unicode (wide) filenames. In order to use them, it is necessary to convert the path and/or wildcard inputs to the FindFirstFileW() call, to Window's internal Unicode representation, UTF-16(LE). This can be done using Encode::encode(). Having used this call (say)

## Why oh why does it try to modify the input!? my $uPath = encode( 'UTF-16LE', $_ = "\\\\?\\c:\\some\\path", 1 );

The scalar $uPath will contain the UTF-16LE representation of the input.

use Encode; print encode( 'UTF-16LE', $_ = "\\\\?\\C:\\some\\path", 1 );; \ \ ? \ C : \ s o m e \ p a t h $uPath = encode( 'UTF-16LE', $_ = "\\\\?\\C:\\some\\path", 1 );; print unpack 'H*', $uPath;; 5c005c003f005c0043003a005c0073006f006d0065005c007000610074006800

Now I need to pass this as the first parameter to FindFirstFileW()--which is defined as

HANDLE FindFirstFileW( LPCTSTR lpFileName, LPWIN32_FIND_DATA lpFindFil +eData ) LPTCSTR => long pointer to (array of) TCHAR TCHAR => wchar_t

either directly through Win32::API, or indirectly through glob or opendir; but if embedded null detection was implemented, it would warn or die because every other byte of the PV contains a null--but that is intentional and correct.

So, not only would Perl need to detect embedded nulls, it would also need to detect cases where embedded nulls are legitimate. The only way I can see that it could do that is if an additional field where added to the SV to record the encoding contained by the SV--a not inconsiderable expense.

And that's completely ignoring the fact that SVs can and often do contain arbitrary binary data that can legitimately contain nulls and for which there would be no generic mechanism for flagging as exempt from embedded null byte detection.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

In reply to Re^8: How is perl able to handle the null byte? by BrowserUk
in thread How is perl able to handle the null byte? by muba

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (2)
As of 2022-08-16 10:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?