in reply to Re: Re^2: Removing File Extensions
in thread Removing File Extensions

The goal is to teach someone that the issues one generally runs into have been run into many times before. There is often a wheel that's round enough out there.

I guess our difference in opinion is where draw the line between "simple enough to do 'by hand'" versus "ok, this is hard, let's see if someone else has already done it better".

I'm not against modules, but I am against unnecessary modules. In this case, as I said, I found the use of a module overkill: instead of having a single regex that did a fairly simple task (and I could comment that regex further, but I didn't think it needed it), I now need to read and understand the documentation for the module. (And, in some cases, install it, make sure it's up to date, etc; File::Basename is a standard module, so that isn't an issue here.)

I once had a conversation with someone who thought a language (or, at least, a library / API) to generate regexes would be a handy thing to have. I disagreed, seeing as REs are already their own language; learning another one on top of the actual regex seemed unnecessary.

Replies are listed 'Best First'.
Re: Re^4: Removing File Extensions
by dragonchild (Archbishop) on May 02, 2004 at 15:54 UTC
    Frankly, using a regex for this is even a little overkill. The easiest solution, for many people, would be to do something like this:
    my ($base, $ext) = split '.', $filename, 2;

    Of course, that fails with filenames that have more than one '.' in them.

    my ($base, $ext) = map { reverse } split '.', reverse($filename), 2;

    Simple! :-)

    ------
    We are the carpenters and bricklayers of the Information Age.

    Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

      Frankly, using a regex for this is even a little overkill.

      Ah, I fear I must disagree with you again. :) The simplest non-regex solution I can think of (where "simple" is some vague measure of how easy or difficult it is to comprehend what's going on) is this:

      my ( $just_name, $just_ext ); my $last_dot_pos = rindex $filename, '.'; if ( $last_dot_pos > -1 ) { $just_name = substr $filename, 0, $last_dot_pos; $just_ext = substr $filename, $last_dot_pos+1; } else { $just_name = $filename; $just_ext = undef; }

      Which has its own potentially subtle problems (yay off-by-one errors!), but is as straightforward as it gets. (In particular, this is the sort of code that a perl novice would likely write - so if that is the level of user / maintenance programmer we're aiming for...)

      While the power and flexibility of regexes might not be used to their fullest in splitting a filename from its extension, the "template" of the operation is a very common and easily comprehended one:

      1. try a match;
      2. see if it worked;
      3. capture subexpressions if it did;
      4. complain if it didn't.

      That is the aspect of the regex solution that I find compelling: the regex is a bit hairy, but the structure it is embedded in is one of the core patterns in perl. And the difference between "Here's a core pattern, I can instantly see what's going on, now I just need to grok the regex" versus "Wait, what does that module do again? What does this parameter mean? What are the special cases? What happens if it doesn't match? What errors can it throw?"... that is the difference I was trying to highlight.

      Finally, I know this is all tradeoff; and different people have different thresholds where they'd draw the line. For me, I don't think that regexes are overkill: in perl, they are a first-class citizen, and an essential part of the programmer's vocabulary. (Heck, even your solutions use it, as the first argument to split -- which is still a regex, so you had better escape it. :-)

        You made my point for me, which was "Where do you draw the line in overkill?". Using a standard module to the extent it can be used is, very often, the difference between ease of porting and a month's rewrite (q.v. nodes to that effect on this site).

        We are in agreement on the standard template for a given operation, regex or not. The point I was trying to make is that, just as regexen are first-class citizens, so are CPAN modules, especially those modules that are in the core. Although 99% of those using File::Basename will never need to port to VMS, the 30-40% of us who work in the Win/*nix world are very happy to not have to worry about '/' vs. '\'. That's why I promote solutions that use the modules that are both in the core and in the list of commonly-accepted modules. (DBI, CGI::Application, HTML::Template, Template, *::Utils, Text::xSV, etc.)

        ------
        We are the carpenters and bricklayers of the Information Age.

        Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose