in reply to Re^6: how are ARGV and filename strings represented?
in thread how are ARGV and filename strings represented?

So, how should your proposed library handle file names?

I like the idea of mapping invalid bytes to other characters (e.g. surrogates like Python's surrogateescape, characters beyond 0x10FFF, etc.)

This provides a way of accepting and generating any file name, while considering files names to be decodable text.

Replies are listed 'Best First'.
Re^8: how are ARGV and filename strings represented?
by afoken (Chancellor) on May 05, 2024 at 14:30 UTC
    I like the idea of mapping invalid bytes to other characters

    How ever that is implemented, it must be implemented system-wide, or you will end up in chaos. So, it must become either part of the kernel, or of the libc.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
      How ever that is implemented, it must be implemented system-wide, or you will end up in chaos.

      We already live in chaos. Python implemented it python-wide, and arguably resulted in less chaos than Perl.

      Another option is to pair the string of bytes with the best guestimate of its encoding within some sort of path object, and then be able to flatten it back to the same bytes it came from, and also answer questions about what it would look like in unicode and how confident we are about it's encoding. I'm proposing wrapping the paths in an object anyway, so maybe that's what I'd do. I need them to stringify back to bytes in order to interoperate with the rest of Perl, anyway. Python gets the advantage of the whole language ecosystem respecting the remapped invalid characters, so they can pass filenames around as plain strings.

      The Perl library filename-taking and filename-producing operators would need to support it, and any interface to external systems would need to be aware of it.

      But lack of support by interfaces wouldn't be that bad. You would simply end up with a file one can't create/access, which is already the case.