Perl / FileFind or ...

Festus Hagen has asked for the wisdom of the Perl Monks concerning the following question:

Hiya all,

Poor title excuse: I'm completely flabbergasted by this ...

Such a simple thing ... ??

#!/usr/bin/perl

use strict;
use warnings;

#  **
#  * It is what it is, you can do with it as you please. [with respect
+, leave the credits]
#  *
#  * Just don't blame me if it teaches your computer to smoke!
#  *
#  *  -Enjoy
#  *  fh :)_~
#  **

use File::Find;

my $Directory = 'C:/Tmp';
my @flist;


sub cbFileFind
{
  print $File::Find::name, "\n";
}

find(\&cbFileFind, $Directory);
[download]

What's up here??

C:\Documents and Settings\fh\My Documents\Scripts\Perl\Audio\CleanFile
+s>example.pl
C:/Tmp
C:/Tmp/M÷tley Cr&#8319;e
C:/Tmp/M÷tley Cr&#8319;e/Dr. Feelgood (Bonus Track Version)
C:/Tmp/M÷tley Cr&#8319;e/Dr. Feelgood (Bonus Track Version)/07 - Same 
+Ol' Situation (S.O.S).mp3
C:/Tmp/M÷tley Cr&#8319;e/Saints of Los Angeles
C:/Tmp/M÷tley Cr&#8319;e/Saints of Los Angeles/05 - Saints of Los Ange
+les (Gang Vocal).mp3
[download]

C:\Documents and Settings\fh\My Documents\Scripts\Perl\Audio\CleanFile
+s>dir /s /b C:\Tmp
C:\Tmp\Mötley Crüe
C:\Tmp\Mötley Crüe\Dr. Feelgood (Bonus Track Version)
C:\Tmp\Mötley Crüe\Saints of Los Angeles
C:\Tmp\Mötley Crüe\Dr. Feelgood (Bonus Track Version)\07 - Same Ol' Si
+tuation (S.O.S).mp3
C:\Tmp\Mötley Crüe\Saints of Los Angeles\05 - Saints of Los Angeles (G
+ang Vocal).mp3

C:\Documents and Settings\fh\My Documents\Scripts\Perl\Audio\CleanFile
+s>
[download]

-Enjoy
fh : )_~

Comment on Perl / FileFind or ... Select or Download Code

Replies are listed 'Best First'.
Re: Perl / FileFind or ... by graff (Chancellor) on Nov 28, 2012 at 04:53 UTC
What makes you think that handling non-ASCII characters in path/file names should be simple? I suppose that if you have intimate knowledge about the OS you're using, and about the file system installed on the specific disk volume you're using, and about the capabilities of the particular terminal/browser/other application that is trying to display file name strings on your monitor, and about the environment/configuration settings that control the behavior of that application, and about the process(es) that created the file names on that specific disk volume in the first place, then you might know enough for the handling of non-ASCII file names to seem "simple." But if you lack intimate knowledge on any of those topics, your first resort should be to get a hex-dump view of the byte sequences being used in any given file name string. That way, all you need is a general knowledge of the possible non-ASCII character encodings, and perhaps some presupposition about the (human) language being used by the person who assigned the file name (or at least, some sense of the alphabet being used - Cyrillic? Greek? Latin? Arabic? ... - including the range of diacritic marks, odd-ball punctuation and/or special symbols that are likely to show up). Not that this in itself is "simple", but at least there are fewer moving parts. Obviously, getting a hex-dump style output just gets in the way when file paths contain nothing outside the printable ASCII range, so a useful elaboration of your File::Find callback might go something like this: `sub cbFileFind { my $printable_name = $File::Find::name; $printable_name =~ s/([^ -~])/sprintf("\\x{%02x}",ord($1))/eg; print $printable_name, "\n"; }` [download] If you happen to already know (or if the approach just shown makes it clear) what the particular character encoding is for the non-ASCII portions of your file names, you can use Encode to convert (decode) the strings as read from the file system into perl-internal (utf8) encoding, and then the "ord()" function will return unicode code-point numbers. which you can look up in case the particular characters are unfamiliar to you (check out Re: Regular expressions and accents and tlu -- TransLiterate Unicode).	[reply] [d/l]
Re: Perl / FileFind or ... by runrig (Abbot) on Nov 27, 2012 at 21:46 UTC
You were expecting maybe: `C:/Tmp/Justin Bieber C:/Tmp/The Archies C:/Tmp/Debby Boone` [download] ???	[reply] [d/l]
Re^2: Perl / FileFind or ... by Festus Hagen (Acolyte) on Nov 27, 2012 at 22:50 UTC
`find / -name "Bieber" -exec rm -rf {} \;` -Enjoy fh : )_~	[reply] [d/l]
Re^3: Perl / FileFind or ... by Anonymous Monk on Nov 28, 2012 at 16:51 UTC
You destroyed my research on the bieberite mineral =(	[reply]
Re^4: Perl / FileFind or ... by graff (Chancellor) on Nov 29, 2012 at 01:56 UTC
Re: Perl / FileFind or ... by TomDLux (Vicar) on Nov 27, 2012 at 21:20 UTC
I think you're complaining about getting the divide symbol, or '#8319;' instead of accented characters. Try utf8 instead of USASCII. As Occam said: Entia non sunt multiplicanda praeter necessitatem.	[reply]
Re: Perl / FileFind or ... by Anonymous Monk on Nov 27, 2012 at 20:37 UTC
What is the question, what is it you're wondering about? binmode STDOUT; or transliterate? Win32::GetLongPathName()? Win32::Unicode, Win32::Unicode::Native, Re^5: Is File::Find Unicode-(Conformant\|Compliant\|Enabled\|Capable)?	[reply]
Re: Perl / FileFind or ... by blue_cowdawg (Monsignor) on Nov 27, 2012 at 20:46 UTC
As the mystery monk implies: what's the question? Looks like it is working as designed... Peter L. Berghold -- Unix Professional Peter -at- Berghold -dot- Net; AOL IM redcowdawg Yahoo IM: blue_cowdawg	[reply]
Re: Perl / FileFind or ... by Festus Hagen (Acolyte) on Nov 27, 2012 at 21:56 UTC
Yea Tom, that be exactly the issue. Right or wrong, I have tried utf8, unicode and many other things found while searching, to no avail. Guess I just don't get it, Why such a simple thing is so difficult. -Enjoy fh : )_~	[reply]
Re^2: Perl / FileFind or ... by Anonymous Monk on Nov 27, 2012 at 22:51 UTC
Yea Tom, that be exactly the issue. Since PerlMonks offers threaded discussions, it is important to reply to the correct node by clicking the [reply] alongside the node of interest Guess I just don't get it, Why such a simple thing is so difficult. Decode the input, Encode the output, read perlunitut: Unicode in Perl#I/O flow (the actual 5 minute tutorial) and learn about your shell `$ chcp Active code page: 437 $ echo > "da-MötleyCrüe" $ dir /b "da-" da-MötleyCrüe $ dir /b "da-" \| perl -MData::Dump -e " dd[<>] " ["da-M\x94tleyCr\x81e\n"] $ perl -MData::Dump -e " dd[ glob q/da-/ ] " ["da-M\xF6tleyCr\xFCe"]` [download] Single byte encoding can be hard to guess $ perl -MEncode::Detective=detect -le " die detect( glob q/da-/ ) " windows-1252 at -e line 1. $ perl -MEncode::Guess -e " die guess_encoding( glob q/da-/ ) " No appropriate encodings found! at -e line 1. $ dir /b "da-" \| perl -MEncode::Detective=detect -e " $f = <>; die de +tect($f ) " Died at -e line 1, <> line 1. $ dir /b "da-" \| perl -MEncode::Guess -e " $f = <>; die guess_encodin +g($f ) " No appropriate encodings found! at -e line 1, <> line 1. $ dir /b "da-" \| perl -MEncode::Guess -e " $f = <>; die guess_encodin +g($f , q/cp437/) " Encode::XS=SCALAR(0x9a622c) $ dir /b "da-" \| perl -MEncode::Guess -e " $f = <>; die guess_encodin +g($f , q/cp437/)->name " cp437 at -e line 1, <> line 1. [download] But once you know, just binmode `$ perl -le " print for glob q/da-/ " da-M÷tleyCrⁿe $ perl -le " binmode STDOUT , q/:encoding(cp437)/; print for glob q/da-/ " da-MötleyCrüe $ perl -Mopen=:std,encoding(cp437) -le " print for glob q/da-/ " da-MötleyCrüe $ perl -MEncode::Locale -le " binmode STDOUT, q{encoding(console_out)}; print for glob q/da-*/ " da-MötleyCrüe`	[reply] [d/l] [select]
Re: Perl / FileFind or ... by Festus Hagen (Acolyte) on Nov 27, 2012 at 21:10 UTC
Y'all are kidding right ?? -Enjoy fh : )_~	[reply]
Re^2: Perl / FileFind or ... by Anonymous Monk on Nov 27, 2012 at 21:38 UTC
Y'all are kidding right ?? No, are you kidding? Between the perlmonks latin-1 limitation , the variability of win32 filesystuems (fat/ntfs/...), and whatever you're dealing with, I don't know what you're complaining about. It is either what you see in the console, in which case binmode something, Text::Unidecode ... whatever you want Or the problem is the ANSI filenames you get on win32( When Unicode Does Not Happen ), in which case you need GetLongPathName or Win32::Unicode::Native I know what I mean. Why don't you?, How do I post a question effectively? When you're asked for clarification, it probably isn't a joke.	[reply]
Re: Perl / FileFind or ... by Festus Hagen (Acolyte) on Nov 28, 2012 at 15:30 UTC
First, Thanks to Anonymous Monk for an excellent and informative post. Simple ... Yea, it should be! Why? Because it's a high level language (or supposed to be), And it should be smart enough to handle basic OS configuration. All Perl has to do is ask the OS and set itself accordingly! As is pointed out in this thread. Now if it was a string created from user data, that would be a different story. It's not, it's OS data ... The OS knows what it is, Perl should as well! -Enjoy fh : )_~	[reply]
Re^2: Perl / FileFind or ... by choroba (Cardinal) on Nov 28, 2012 at 15:59 UTC
There is no standard way to communicate the encoding of a file system. لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply]
Re^2: Perl / FileFind or ... by Anonymous Monk on Dec 01, 2012 at 15:04 UTC
Know what? Your comments are very confused , esp because you're [reply]ing to yourself, again If I assume you're talking about the console code page, perl doesn't assume you're writing a terminal program	[reply]