in reply to Re^2: AWTDI: Renaming files using regexp
in thread AWTDI: Renaming files using regexp

I admit, I was using the traditional definition which doesn't apply to *ix. So let's discuss this modern definition, one where file extentions are can be specific to any of file systems, operating systems and applications.

Applications need a way of knowing if an extention was supplied. For example, it needs to know that to decide whether it should add the default extention. I can think of two ways:

The first solution has two problems: 1) It's slow, and 2) it requires that all extentions be registered.

The second solution might be wrong on rare occasions, but it doesn't have the problems of the first solution. The catch is that it must restrict the characters that can be present in extentions. Most file names with non-extention dots have spaces following their dots, so the simplest restriction is to forbid spaces in exentions. This prevents applications from thinking file "This is private. Don't read" has an extention.

Therefore, my recommendations is to disallow spaces in extentions, even if the underlying OS or file system doesn't impose such a restriction.

Replies are listed 'Best First'.
Re^4: AWTDI: Renaming files using regexp
by duff (Parson) on Apr 11, 2006 at 18:50 UTC
    Applications need a way of knowing if an extention was supplied. For example, it needs to know that to decide whether it should add the default extention.

    Not really. In the example you give the application has its own idea of "extension", so all it has to do is:

    $filename =~ s/$/$ext/ unless $filename =~ m/$ext$/;

    So it's not "does this filename look like it has an extension", but rather, "does this filename look like it has this particular extension" This allows the application to use anything and in no way restricts the characters that can appear in the extension as your second to last paragraph states.

    Still, what does it mean to be a filename "extension". I keep using scare-quotes on that word because I've always found it slightly moronic. "The N characters we've allowed you weren't enough? Now you get M more!" It was an efficiency hack that was exposed to the end user and what's more named so that users have a conceptual handle to hang on to the hack. Terrible, terrible, terrible. They should have just stuck with filenames with a larger maximum length.

    Anyway, if I were making recommendations as to what to consider an "extension" in general, I'd say it must match /\.[a-zA-Z0-9]+\z/ because that gives a nod to history and another nod to the modern day by not restricting it to 3 characters. (Is .jpeg a valid filename "extension"? :-) But, of course, this only covers 99% of the cases. There will still be strange suffixes used by some applications and that's okay.

      So it's not "does this filename look like it has an extension", but rather, "does this filename look like it has this particular extension"

      No. Not always. Notepad Save As "Text Documents" is like that, but Notepad Save As "All Files" is not like that.

      Save as Text Doc "boo.txt" -> saved as boo.txt /\.txt\z/==1 Save as Text Doc "boo" -> saved as boo.txt /\.txt\z/==0 Save as Text Doc "boo." -> saved as boo..txt /\.txt\z/==0 Save as Text Doc "boo.pl" -> saved as boo.pl.txt /\.txt\z/==0 Save as Text Doc "boo. pl" -> saved as boo. pl.txt /\.txt\z/==0 Save as All Files "boo.txt" -> saved as boo.txt '.txt' ne '' Save as All Files "boo" -> saved as boo.txt '' eq '' Save as All Files "boo." -> saved as boo. '.' ne '' Save as All Files "boo.pl" -> saved as boo.pl '.pl' ne '' Save as All Files "boo. pl" -> saved as boo. pl.txt '' eq ''

      "Text Doc" does
      $path .= '.txt' unless $path =~ /\.txt\z/;
      which is what I presumed you had in mind.

      "All Files" does
      $path = $filename . ($ext eq '' ? '.txt' : $ext);
      which is what I had in mind. This is the traditional way. Any extention precludes .txt from being added. Well, any registered extention, since Notepad uses the first method I mentioned to identify extentions.

      The "Text Doc" behaviour is related to the Hide Extentions "feature" of Windows. Both of these features are real pains.