Thanks! The detailed explanation helped and is appreciated.

"Plus, AFAIK, file name encodings are a very complicated topic, and therefore I think you might do yourself a favor by not using "¿" in filenames and URLs."

Of course. However, there are several reasons:

Use of non-ASCII characters like Ö, Ø, Ó, Ô, 月, 日, or even ¿ or ¡ is to be expected these days, even in file names and thus URLs.

The rename utility listed above deals with the renaming, and seems to match what can be produced manually via a local terminal emulator, a local console, or a remote ssh+tmux connection. So it was my script which was the odd man out and therefore needed correction.

The file names, minus the inverted question mark, are the result of using wget to scrape the output from some legacy PHP scripts which are not / cannot be maintained any more. Aside from the very long file names, the method works reasonably well for converting the whole mess to a static HTML archive. Unfortunately, that leaves a question mark in the file name and that is not tolerated by web servers and use it to delimit the start of a query string and the end of the file name. So a replacement character is needed and ¿ seems the least problematic semantically.


In reply to Re^2: Matching non-ASCII file contents with file name. by mldvx4
in thread Matching non-ASCII file contents with file name. by mldvx4

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.