comment on

I don't know about the problems you're having with network filesystems, other than the fact that it is the job of the filesystem to convert as necessary. Windows NTFS is going to be using 2-byte UCS-2 (a close relative of UTF-16) to store the filenames on disk, but Linux generally uses utf8 filenames.

That, however, is the job of smbfs and samba to sort out, though.

You should just be able to read and write utf8 filenames, as in the code below, however I get failed tests for #13 and 15. This is presumably because the filenames returned from 'glob' and 'readdir' *don't* have the utf8 flag on.

Do any monks have some more info on this? If I read a filename from a utf8 filesystem, should the filename have the utf8 flag on? (ASCII-exception permitting, of course).

perl 5.8.8

#!/usr/bin/perl
use strict;
use warnings;
use Test::More(tests => 14);
use Encode;

binmode STDOUT, ':utf8'; # If you have a UTF-8 terminal

my $workdir = "./tt";
mkdir $workdir; # Let it fail if it already exists

# This is a byte sequence, not tagged as utf8 to perl
# so theoretically perl should consider it to be in the local
# encoding, normally latin1
my $place = "M\xc3\xbcnchen";
test_placename($workdir, $place);

# Turn on the flag for this scalar. Since we pre-arranged for
# the byte sequence of this scalar to contain valid utf8, this
# scalar is now a valid perl unicode string.
Encode::_utf8_on($place);
test_placename($workdir, $place);

exit 0;


sub test_placename {
    my $workdir = shift;
    my $place = shift;

    my $fname = "$workdir/$place";

    my $fh;
    ok(!-f $fname, "$fname doesn't already exist");
    open($fh, ">", $fname)
        or die "Can't create $fname : $!";
    close $fh;
    ok(1, "can create $fname with 'open'/close");
    ok(-f $fname, "can find $fname with -f");

    my @files = glob("$workdir/$place");
    is(scalar @files, 1, "One file in dir via glob");
    is($files[0], $fname, "and it's what we expect");

    my $dh;
    opendir $dh, $workdir
        or die "Can't open $workdir : $!";
    @files = grep { !/^\./ } readdir $dh;
    closedir $dh;
    is(scalar @files, 1, "One file in dir via readdir");
    is($files[0], $place, "and it's what we expect");

    my $num_files_unlinked = unlink($fname);
    is($num_files_unlinked, 1, "can remove $fname");
}
[download]

In reply to Re: directories and charsets by jbert
in thread directories and charsets by soliplaya

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.