G'day BernieC,

The character that you show as ️ is "U+FE0F VARIATION SELECTOR-16"; it indicates that the preceding emoji character should be rendered in its graphical form. Its complement is "U+FE0E VARIATION SELECTOR-15"; it indicates that the preceding emoji character should be rendered in its textual form. See Unicode PDF code chart: "Variation Selectors Range: FE00–FE0F".

The character that you show as ✈ is "U+2708 AIRPLANE". As part of the demo code below, I've also used "U+2709 ENVELOPE". Find both of those in Unicode PDF code chart: "Dingbats Range: 2700–27BF".

The following two short scripts: create some files with and without Unicode characters in their filenames; identify the filenames with Unicode characters and rename them.

First, create the files for the demo.

C:\Users\ken\tmp\pm_11149351_unicode_filenames>dir Volume in drive C is Primary Drive Volume Serial Number is 5A0C-01CD Directory of C:\Users\ken\tmp\pm_11149351_unicode_filenames 04-Jan-23 15:06 <DIR> . 04-Jan-23 15:06 <DIR> .. 04-Jan-23 14:18 337 mkfiles.pl 04-Jan-23 15:03 271 mvfiles.pl 2 File(s) 608 bytes 2 Dir(s) 1,533,002,100,736 bytes free C:\Users\ken\tmp\pm_11149351_unicode_filenames>more mkfiles.pl #!perl use strict; use warnings; use autodie; my $emoji_airplane = "\x{2708}\x{FE0F}"; my $emoji_envelope = "\x{2709}\x{FE0F}"; my @fnames = ( 'AIR_2708_FE0F', "___ $emoji_airplane $emoji_airplane", 'ENV_2709_FE0F', "___ $emoji_envelope $emoji_envelope", ); for my $fname (@fnames) { open my $fh, '>', $fname; } C:\Users\ken\tmp\pm_11149351_unicode_filenames>perl mkfiles.pl

Use <pre> block to show Unicode characters:

C:\Users\ken\tmp\pm_11149351_unicode_filenames>dir
 Volume in drive C is Primary Drive
 Volume Serial Number is 5A0C-01CD

 Directory of C:\Users\ken\tmp\pm_11149351_unicode_filenames

04-Jan-23  15:32    <DIR>          .
04-Jan-23  15:32    <DIR>          ..
04-Jan-23  15:32                 0 AIR_2708_FE0F
04-Jan-23  15:32                 0 ENV_2709_FE0F
04-Jan-23  14:18               337 mkfiles.pl
04-Jan-23  15:03               271 mvfiles.pl
04-Jan-23  15:32                 0 ___ âœˆï¸ âœˆï¸
04-Jan-23  15:32                 0 ___ âœ‰ï¸ âœ‰ï¸
               6 File(s)            608 bytes
               2 Dir(s)  1,533,000,298,496 bytes free

C:\Users\ken\tmp\pm_11149351_unicode_filenames>

Now rename the filenames with Unicode characters.

C:\Users\ken\tmp\pm_11149351_unicode_filenames>more mvfiles.pl #!perl use strict; use warnings; use autodie; use File::Copy 'move'; opendir(my $dh, '.'); for my $fname (readdir $dh) { next if $fname =~ /^[\x00-\x7f]+$/; (my $new_name = $fname) =~ s/([^\x00-\x7f])/'+U' . ord($1) . 'U+'/ +eg; move($fname, $new_name); } C:\Users\ken\tmp\pm_11149351_unicode_filenames>perl mvfiles.pl C:\Users\ken\tmp\pm_11149351_unicode_filenames>dir Volume in drive C is Primary Drive Volume Serial Number is 5A0C-01CD Directory of C:\Users\ken\tmp\pm_11149351_unicode_filenames 04-Jan-23 15:35 <DIR> . 04-Jan-23 15:35 <DIR> .. 04-Jan-23 15:32 0 AIR_2708_FE0F 04-Jan-23 15:32 0 ENV_2709_FE0F 04-Jan-23 14:18 337 mkfiles.pl 04-Jan-23 15:03 271 mvfiles.pl 04-Jan-23 15:32 0 ___ +U226U++U156U++U136U++U239U++U1 +84U++U143U+ +U226U++U156U++U136U++U239U++U184U++U143U+ 04-Jan-23 15:32 0 ___ +U226U++U156U++U137U++U239U++U1 +84U++U143U+ +U226U++U156U++U137U++U239U++U184U++U143U+ 6 File(s) 608 bytes 2 Dir(s) 1,532,999,979,008 bytes free

So, that's very much skeleton code to demonstate a technique. You may want to offer an option to type a new filename; you may want something other than the default character conversion to "+U...U+". Perhaps you need to perform this recursively through a directory hierarchy. Depending on how you alter this to suit your preferences, validation, exception handling, and similar checks may be appropriate.

The ball's in your court. Take it from here ...

— Ken


In reply to Re: Unicode file names by kcott
in thread Unicode file names by BernieC

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.