in reply to AWTDI: Renaming files using regexp

You forgot to escape your dot. It will match any character, not just a dot.

Furthermore, some files have dots, but no extentions. For example, consider "Foo. Bar". ". Bar" is not an extention because it has a space in it. You can verify this by checking the properties of "Foo. Bar" and "Foo.Bar". For the former, the file type is "File", while for the latter, the file type in "BAR File". You (and everyone in this thread) mistakenly identify ". Bar" as an extention and don't convert "Foo. Bar" to "Foo_ Bar".

The correct usage would be:
fileparse($_, qr/\.[^. ]*/)

Replies are listed 'Best First'.
Re^2: AWTDI: Renaming files using regexp
by BrowserUk (Patriarch) on Apr 11, 2006 at 16:09 UTC

    Did you look (I mean actually look) at his sample data?

    If you are coding a generic tool for dealing with filenames of unknown formats, then such critisms are valid.

    But if the guy knows what his data looks like; he as a working solution for his problem, as opposed to the arbitrarially extended problem you have imposed upon his question; and he states:

    The question is more academic than anything since I have an answer, just looking to expand my knowledge some :-)

    then your admonishment of the OP (for his working solution), everyone in this thread is ... ...!

    I'll let you fill in the blanks


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      One of the ideas of this site is to intruct not only the OP, but the people who might be searching through the site at a later time. The next person to use this site may have different data.

      And yes, I did expand knowledge (and started a discussion which will expand it further).

Re^2: AWTDI: Renaming files using regexp
by duff (Parson) on Apr 11, 2006 at 17:07 UTC

    Can you point to a definitive definition of "filename extension" that says that spaces aren't allowed? Are there any other characters that are not valid in the "extension"?

    You suggest a method of verification that assumes a particular operating system I think as the OS that I typically use says "unknown" for the file type of both files named Foo.Bar and Foo. Bar

    When someone talks about "filename extensions", are they implicitly referring to a particular operating system? I don't think so as the term, while it originated on systems that were hobbled by their choice of filesystem implementation, is used today when referring to filenames that were constructed on any operating system/file system.

      I admit, I was using the traditional definition which doesn't apply to *ix. So let's discuss this modern definition, one where file extentions are can be specific to any of file systems, operating systems and applications.

      Applications need a way of knowing if an extention was supplied. For example, it needs to know that to decide whether it should add the default extention. I can think of two ways:

      • Check a database of registered extentions.
      • Check if the filename looks like it has an extention.

      The first solution has two problems: 1) It's slow, and 2) it requires that all extentions be registered.

      The second solution might be wrong on rare occasions, but it doesn't have the problems of the first solution. The catch is that it must restrict the characters that can be present in extentions. Most file names with non-extention dots have spaces following their dots, so the simplest restriction is to forbid spaces in exentions. This prevents applications from thinking file "This is private. Don't read" has an extention.

      Therefore, my recommendations is to disallow spaces in extentions, even if the underlying OS or file system doesn't impose such a restriction.

        Applications need a way of knowing if an extention was supplied. For example, it needs to know that to decide whether it should add the default extention.

        Not really. In the example you give the application has its own idea of "extension", so all it has to do is:

        $filename =~ s/$/$ext/ unless $filename =~ m/$ext$/;

        So it's not "does this filename look like it has an extension", but rather, "does this filename look like it has this particular extension" This allows the application to use anything and in no way restricts the characters that can appear in the extension as your second to last paragraph states.

        Still, what does it mean to be a filename "extension". I keep using scare-quotes on that word because I've always found it slightly moronic. "The N characters we've allowed you weren't enough? Now you get M more!" It was an efficiency hack that was exposed to the end user and what's more named so that users have a conceptual handle to hang on to the hack. Terrible, terrible, terrible. They should have just stuck with filenames with a larger maximum length.

        Anyway, if I were making recommendations as to what to consider an "extension" in general, I'd say it must match /\.[a-zA-Z0-9]+\z/ because that gives a nod to history and another nod to the modern day by not restricting it to 3 characters. (Is .jpeg a valid filename "extension"? :-) But, of course, this only covers 99% of the cases. There will still be strange suffixes used by some applications and that's okay.

Re^2: AWTDI: Renaming files using regexp
by nimdokk (Vicar) on Apr 11, 2006 at 17:26 UTC
    That is a good point and I'll take it into consideration. However, I will admit that the fileparse line was copied straight out of the Camel book. And yes, I know, I should not copy things without fully understanding them :-)