comment on

I have read the documents in question, and understand what they say. Really. I have been handling these character set issues for a while, including the Unicode/ISO conversions back and forth in Perl (and iso-8859-x, and "modified utf-7 for IMAP", etc..). I just thought I'd insist on this so that you wouldn't think that I don't understand the basic issues at hand.

My problem is, specifically, that I had a case where a filename entry that was created by a Windows PC, on a Linux server, and which did contain utf-8 encoded characters, failed the "-f" test and an open() test, and I was, and still am, trying to figure out why.
What I did learn from you, was that I should apparently not blindly convert my filenames to utf8 (which would have been my first inclination). Thanks.

However, inasmuch as the main issue goes, I now am close to believing that there are gremlins at play. I am trying to show more clearly the problem I indicated at the start, but now when I am creating and dropping Windows files (with accented characters in the names) into the DAV directories, they are being picked up and read by the Perl script, accents and all, and the filenames seem to be Latin-1, not utf-8 anymore.
The Windows PC I am dropping the files from, is a Spanish Windows XP station, set up with a Spanish keyboard and all.
The only thing I changed in my script, I swear, was to add an additional logging message, showing the $! error code in case of the failed open().
The only explanation I can think of at the moment is that somehow Windows XP, with regards to filenames, has the capability to create them in either Windows Latin-1 charset or Unicode utf-8 (and encode this information in it's directory entry ?). Would you know any Windows guru that could confirm/infirm this ?
If that's not true, then I will need to write a little server-side script that forcefully creates several versions of the testfile name (utf8-encoded and not) directly on the Linux server. I'll get busy on that anyway, if only to simplify the issue.

On another plane, I am new on this site, and do not want to encumber it with the step-by-step resolution to the problem. Suppose I take a break now, and come back to this thread when I have a solid description of what happens. Do I just pick up your last message and hit reply, or start a new question ?

In reply to Re^4: utf8 in directory and filenames by soliplaya
in thread utf8 in directory and filenames by soliplaya

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.