My goal is to use the substitution operator (s///) to replace occurrences of a question mark (?) with an inverted question mark (¿) om specific line in a large number of files. I am having trouble with what is actually getting substituted inside the file in that it does not match what ends up in the file name in the file system. I am grateful for any tips or guidance as to what to have that which is inside the files match various file names out in the file system. Perhaps it is matter of encoding, again?

It is claimed that the inverted questionmark is \x00BF, which strangely is C2 BF in UTF-8 according to a "Unicode Character Table site.

In the shell (Bash) on an EXT4, that seems to be the case and the Perl utility rename seems to work that way, too.

$ touch ¿ $ ls ? > zz $ xxd zz 00000000: c2bf 0a $ echo '¿' > yy $ xxd yy 00000000: c2bf 0a $ touch xx $ rename -v 's/xx/¿/;' xx xx not renamed: ¿ already exists $ rename --version /usr/bin/rename using File::Rename version 1.13, File::Rename::Options + version 1.10

And those files show up in Apache2's access logs containing the escape sequence "%C2%BF" in the URL in place of the inverted question mark.
Yet, shouldn't that be \x00BF all the way through? My own Perl scripts work differently:

$ perl -e 'use utf8; print "¿\n"' > ww $ xxd ww 00000000: bf0a $ perl -e 'use utf8; $c="¿\n"; utf8::upgrade($c); print $c' > vv $ xxd vv 00000000: bf0a

Though if I leave out the use utf8 part, then I kind of get the "right" result only according to xxd,

$ perl -e 'print "¿\n"' > uu $ xxd uu 00000000: c2bf 0a $ curl --silent --head 'http://localhost/' | grep 'Content-Type' Content-Type: text/html; charset=utf-8

While keeping UTF-8, how can I get "¿" inside the files to match the "¿" out in the file name and still look right?


In reply to Matching non-ASCII file contents with file name. by mldvx4

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.