in reply to Re: AWTDI: Renaming files using regexp
in thread AWTDI: Renaming files using regexp

Can you point to a definitive definition of "filename extension" that says that spaces aren't allowed? Are there any other characters that are not valid in the "extension"?

You suggest a method of verification that assumes a particular operating system I think as the OS that I typically use says "unknown" for the file type of both files named Foo.Bar and Foo. Bar

When someone talks about "filename extensions", are they implicitly referring to a particular operating system? I don't think so as the term, while it originated on systems that were hobbled by their choice of filesystem implementation, is used today when referring to filenames that were constructed on any operating system/file system.

Replies are listed 'Best First'.
Re^3: AWTDI: Renaming files using regexp
by ikegami (Patriarch) on Apr 11, 2006 at 17:52 UTC

    I admit, I was using the traditional definition which doesn't apply to *ix. So let's discuss this modern definition, one where file extentions are can be specific to any of file systems, operating systems and applications.

    Applications need a way of knowing if an extention was supplied. For example, it needs to know that to decide whether it should add the default extention. I can think of two ways:

    • Check a database of registered extentions.
    • Check if the filename looks like it has an extention.

    The first solution has two problems: 1) It's slow, and 2) it requires that all extentions be registered.

    The second solution might be wrong on rare occasions, but it doesn't have the problems of the first solution. The catch is that it must restrict the characters that can be present in extentions. Most file names with non-extention dots have spaces following their dots, so the simplest restriction is to forbid spaces in exentions. This prevents applications from thinking file "This is private. Don't read" has an extention.

    Therefore, my recommendations is to disallow spaces in extentions, even if the underlying OS or file system doesn't impose such a restriction.

      Applications need a way of knowing if an extention was supplied. For example, it needs to know that to decide whether it should add the default extention.

      Not really. In the example you give the application has its own idea of "extension", so all it has to do is:

      $filename =~ s/$/$ext/ unless $filename =~ m/$ext$/;

      So it's not "does this filename look like it has an extension", but rather, "does this filename look like it has this particular extension" This allows the application to use anything and in no way restricts the characters that can appear in the extension as your second to last paragraph states.

      Still, what does it mean to be a filename "extension". I keep using scare-quotes on that word because I've always found it slightly moronic. "The N characters we've allowed you weren't enough? Now you get M more!" It was an efficiency hack that was exposed to the end user and what's more named so that users have a conceptual handle to hang on to the hack. Terrible, terrible, terrible. They should have just stuck with filenames with a larger maximum length.

      Anyway, if I were making recommendations as to what to consider an "extension" in general, I'd say it must match /\.[a-zA-Z0-9]+\z/ because that gives a nod to history and another nod to the modern day by not restricting it to 3 characters. (Is .jpeg a valid filename "extension"? :-) But, of course, this only covers 99% of the cases. There will still be strange suffixes used by some applications and that's okay.

        So it's not "does this filename look like it has an extension", but rather, "does this filename look like it has this particular extension"

        No. Not always. Notepad Save As "Text Documents" is like that, but Notepad Save As "All Files" is not like that.

        Save as Text Doc "boo.txt" -> saved as boo.txt /\.txt\z/==1 Save as Text Doc "boo" -> saved as boo.txt /\.txt\z/==0 Save as Text Doc "boo." -> saved as boo..txt /\.txt\z/==0 Save as Text Doc "boo.pl" -> saved as boo.pl.txt /\.txt\z/==0 Save as Text Doc "boo. pl" -> saved as boo. pl.txt /\.txt\z/==0 Save as All Files "boo.txt" -> saved as boo.txt '.txt' ne '' Save as All Files "boo" -> saved as boo.txt '' eq '' Save as All Files "boo." -> saved as boo. '.' ne '' Save as All Files "boo.pl" -> saved as boo.pl '.pl' ne '' Save as All Files "boo. pl" -> saved as boo. pl.txt '' eq ''

        "Text Doc" does
        $path .= '.txt' unless $path =~ /\.txt\z/;
        which is what I presumed you had in mind.

        "All Files" does
        $path = $filename . ($ext eq '' ? '.txt' : $ext);
        which is what I had in mind. This is the traditional way. Any extention precludes .txt from being added. Well, any registered extention, since Notepad uses the first method I mentioned to identify extentions.

        The "Text Doc" behaviour is related to the Hide Extentions "feature" of Windows. Both of these features are real pains.