in reply to What would you like to see in a Virtual Filesystem for Perl?

I don't think Perl should implement a VFS. Simply because there are so many filesystems around, and as far as I understand your idea, that would require one module per filesystem. wc -l /prof/filesystems on a random Debian box shows 33 filesystems, including FUSE, which may be used to implement many more filesystems, perhaps even homegrown ones. Also, you need to know which filesystem is mounted in each and every directory, and you will probably also need to know the mount options (Linux can mount FAT and friends with different codepages, see mount).

I would like to see a different approach: A (maybe highly magical) use unicodepaths; that makes all filesystem functions (limited to a scope) accept and return Unicode strings.

As far as a I understand Windows, this would essentially mean to switch from the legacy ANSI API to the Wide (Unicode) API. Windows would perhaps be a good testbed for that switch, as it has an API that explicitly expects and returns Unicode.

For Linux and other Unix systems, some more thinking is needed. You basically need to know if the filenames are just bytes or if they are encoded in UTF-8.

Perhaps just guessing and trying to convert may work good enough for Unix:

Any filename returned from the operating system should be treated as bytes, unlesss unicodepaths is active. If unicodepaths is active, try to decode the bytes as UTF-8. If that succeeds, use the result as Unicode string. If that fails, keep the bytes as-is, and don't set the UTF-8 flag on the returned filename.

Any filename passed to the operating system should be encoded to a UTF-8 byte stream if unicodepaths is active and the filename has the UTF-8 flag set. If unicodepaths is not active and/or the filename has the UTF-8 flag cleared, no encoding should happen. If unicodepaths is not active but the filename has the UTF-8 flag set, a warning should be issued.

(That warning does not seem to happen on my Debian box: perl -w -E '$fn="x\x{ABCD}"; open my $f,">",$fn; say $f "hi"; close $f;' does not warn at all. Perl is v5.32.1 for x86_64.)

Both combined should allow Perl to see Unicode where Unicode happens, while not messing with the encoding for non-Unicode filenames.

Maybe this idea needs some more relaxed encoding of UTF-8 to allow a round-trip of any random bytes in a filename.

Maybe this idea needs to split paths and handle each element of the path separately.

Alexander

--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Replies are listed 'Best First'.
Re^2: What would you like to see in a Virtual Filesystem for Perl?
by NERDVANA (Priest) on Aug 22, 2023 at 23:29 UTC
    as far as I understand your idea, that would require one module per filesystem. wc -l /prof/filesystems on a random Debian box shows 33 filesystems, including FUSE

    Not quite; my idea is that this is all represented by "Unix Native Filesystem" because they all share the same API for querying the files. So, in its default state, VFS would just pass through to "Unix Native Filesystem" or "Windows Native Filesystem" and essentially provide nothing but unicode handling for them.

    The multiple-filesystem aspect comes into play when you want to do something like browse a zip file: open(my $f, "<", "~/example.zip/path/to/Foo.txt") Currently, your only option for that is a FUSE module like fuse-zip The downside is you need to install a set-uid program for that, and the mounts of zip files are visible system-wide to all users, modulo permissions. I would much rather have the zipfile "mounted" exclusively inside the perl interpreter.

    For Linux and other Unix systems, some more thinking is needed. You basically need to know if the filenames are just bytes or if they are encoded in UTF-8. Perhaps just guessing and trying to convert may work good enough for Unix

    I thought I summed this up about how Unix uses 'locale', but I might be wrong! I can't find any reference to an official standard for respecting LC_ALL in path names. I've decided to make a "Meditation" about it. Coming up soon...

    Edit: Meditation complete! (wow that used up most of my evening...)