comment on

I don't think Perl should implement a VFS. Simply because there are so many filesystems around, and as far as I understand your idea, that would require one module per filesystem. wc -l /prof/filesystems on a random Debian box shows 33 filesystems, including FUSE, which may be used to implement many more filesystems, perhaps even homegrown ones. Also, you need to know which filesystem is mounted in each and every directory, and you will probably also need to know the mount options (Linux can mount FAT and friends with different codepages, see mount).

I would like to see a different approach: A (maybe highly magical) use unicodepaths; that makes all filesystem functions (limited to a scope) accept and return Unicode strings.

As far as a I understand Windows, this would essentially mean to switch from the legacy ANSI API to the Wide (Unicode) API. Windows would perhaps be a good testbed for that switch, as it has an API that explicitly expects and returns Unicode.

For Linux and other Unix systems, some more thinking is needed. You basically need to know if the filenames are just bytes or if they are encoded in UTF-8.

Perhaps just guessing and trying to convert may work good enough for Unix:

Any filename returned from the operating system should be treated as bytes, unlesss unicodepaths is active. If unicodepaths is active, try to decode the bytes as UTF-8. If that succeeds, use the result as Unicode string. If that fails, keep the bytes as-is, and don't set the UTF-8 flag on the returned filename.

Any filename passed to the operating system should be encoded to a UTF-8 byte stream if unicodepaths is active and the filename has the UTF-8 flag set. If unicodepaths is not active and/or the filename has the UTF-8 flag cleared, no encoding should happen. If unicodepaths is not active but the filename has the UTF-8 flag set, a warning should be issued.

(That warning does not seem to happen on my Debian box: perl -w -E '$fn="x\x{ABCD}"; open my $f,">",$fn; say $f "hi"; close $f;' does not warn at all. Perl is v5.32.1 for x86_64.)

Both combined should allow Perl to see Unicode where Unicode happens, while not messing with the encoding for non-Unicode filenames.

Maybe this idea needs some more relaxed encoding of UTF-8 to allow a round-trip of any random bytes in a filename.

Maybe this idea needs to split paths and handle each element of the path separately.

Alexander

--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

In reply to Re: What would you like to see in a Virtual Filesystem for Perl? by afoken
in thread What would you like to see in a Virtual Filesystem for Perl? by NERDVANA

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.