in reply to Re^2: remove directory path
in thread remove directory path

Nothing drives me up the wall faster than wanting to use a program that has no reason not to work on my platform of choice except for an assumption that could have been avoided with a moment's extra thought.

You would be correct if the code in question is to be released to the general population. In the real world though, often scripts are written for a very specific task to be performed in a specific environment. If you only perform this task on *NIX systems and will only ever perform this task on *NIX systems, then I see no problem with making those types of assumptions. If this code is to be released for general consumption then your statement would be correct. Published code should be written with maximum portability in mind. However, I was wrong on one point, both the split and the regex solutions are not tied to *NIX platforms, they can be adapted to any platform by just changing the path separator, as the basename is always the last item in the path)

Because it's harder to get a tyop when using File::Basename or File::Spec.

You pass in a string. Everything else is caught at compile time. Mistyping in a regular expression won't be noticed unless you pay close attention to the output.

I don't know where you get this notion, a typo is a typo, is a typo, no matter how you spin it. If you write code for any purpose and don't test it thoroughly then you are already in a heap of trouble. Using a module does not excuse one from testing.

Because using the module makes your code clear, concise, correct, and say what you mean...

You're right basename($path) is more concise than non-module solutions. But that is what comments are for. There is indeed a reason why 99% of all programming languages provide some mechanism for placing comments in the code. Or if you have to perform this all over the place, put it in an aptly named function which addresses your self-commenting code argument. I'm not a fan of obfuscating or even using some of the more sublte shortcuts of Perl for the very reason of readability and maintainability, but we are using some very basic facilities here.

And then it's also solving the problem in the domain of the problem ("how do I get the basename of a file?") rather than in the domain of the solution ("by getting rid of everything up to, and including, the last delimiter").

Call me crazy, but what does it matter so long as you get the basename of the path?

This means that when something changes, it should be much more obvious what needs to be changed. It definitely makes things not only more maintainable, in my experience, but also more likely to withstand changes in requirements without actual code changes, or with fewer code changes.

How do you figure that? Again let's assume that the regex or split solutions provided are moved into an appropriately named function, it becomes quite clear what needs to be changed. In this specific case, what expectations are going to change. If you want to talk generics, that's one thing, but this is a specific case. I can't think of what requirements will change in "give me the basename of the path". If the requirement that you need something other than the basename comes up, you are still going to have to change code to provide the new requirement. So I don't really see the merit of this argument.

Regex and split a sledgehammer? Let's see one or two lines of code as opposed to all the overhead of the previously mentioned modules. I'll take the one or two lines of code which offer far less overhead and operations than the module. If other features were needed from these modules than just basename, I could see the justification for using a module.

Here we have a screw.

More like a carriage bolt. When a screw would do.

Don't get me wrong, I am a great proponent of module use, where the situation warrants. But to use a module for the sake of using a module is just overkill.

The simple truth of the matter is we are both right.

Replies are listed 'Best First'.
Re^4: remove directory path
by Tanktalus (Canon) on Mar 04, 2006 at 15:59 UTC
    In the real world though, often scripts are written for a very specific task to be performed in a specific environment

    That's not my experience. My experience is that a script is often written for a very specific task, management finds out, and thinks it can be used for a bunch of similar tasks, and then you end up bolting on a few extra functions. And then s/he wants to run it on his/her windows box and complains that it doesn't work. Maybe I'm not in the real world yet, though as I've only had one job since I graduated.

    I don't know where you get this notion, a typo is a typo, is a typo, no matter how you spin it. If you write code for any purpose and don't test it thoroughly then you are already in a heap of trouble. Using a module does not excuse one from testing.

    No, a typo is not a typo in all circumstances. A typo in a literal string does not have the same gravity as a typo in code, but it also does not have the same testability. Typos in code most likely means the compiler will complain. Typos in strings will only be fixed if someone pays close attention. And typos in regular expressions may not be noticed until the odd scenario shows up where it fails. If, for example, your regular expression fails to handle a case with spaces in it, you would never know until you actually handed the code a path with a space in it. Not something I normally think about. However, my bet is that the authors and many users of File::Basename have vetted out pretty much all possible issues (or at least more issues than I could think of), so using the tested code means your code will just simply work more often.

    You're right basename($path) is more concise than non-module solutions. But that is what comments are for.
    Clear code is always better than clear comments. It's precisely because the compiler ignores the comments that the code is better: it's what is actually being done.
    # get the base name of the path. $path =~ m~^[\w/]+/(\w+)$~; $basename = $1;
    Woops - doesn't handle spaces at all. No error checking, either. The comment is wrong.
    $basename = basename($path);
    No need for comments because the code says exactly what it's doing. Now, the basename function may be doing something wonky, but that's what comments inside basename are for. The code that's calling it doesn't need comments.

    Call me crazy, but what does it matter so long as you get the basename of the path?

    What matters is getting the basename of a path, not extracting a substring from some text. Say so. That will make it much easier on the next person in reading the code. Even if that's you six months from now. Code that reads in the domain of the problem rather than the solution will be much more natural to follow for whoever gets to maintain it later. It is more resilient to changing requirements.

    Yes, I know that you can't think of requirements changing. Perhaps you have a less creative mangement. Mine has thrown me enough curveballs so far past deadlines (sometimes even after we have shipped to manufacturing) that I need to be able to see around these corners to write code that can simply be tweaked to get whatever change is required. In this case, I could see a few changes. First one is that we actually want the directory the file is in along with it. In this scenario, the regular expression may win out on speed, but it starts getting just a wee bit more convoluted. Or you just do something like: $basename = File::Spec->catfile(basename(dirname($path)), basename($path));, or, probably better yet, File::Spec->catfile( (File::Spec->splitdir($path))[-2..-1] ). This latter one extends pretty easily to any depth.

    Second possibility is that we want the basename, but we're going to put it into a new directory - so we want some arbitrary changes. Here the regular expression can work, but probably a little messy: $name = File::Spec->catfile( qw(some dir), basename($file) )

    Of course, that extends further into wanting to get that other directory from some completely different source that isn't hardcoded into the program at all. Regular expressions here would need the e (eval) modifier to do it in one step, so you'd want to do it in two: regular expression to get the basename, then concat it with whatever the function returned. Oh, and don't forget the path separator (that's an easy bug to have with this method). Of course, using proper functions as a matter of course, even in what you might think of as throw-away programs, means you don't even have to think of the separator: File::Spec->catfile( get_path(), basename($path) )

    Regex and split a sledgehammer? Let's see one or two lines of code as opposed to all the overhead of the previously mentioned modules. I'll take the one or two lines of code which offer far less overhead and operations than the module.

    What I'll take is a good habit to take the person-years (or person-decades) of testing that File::Basename and File::Spec have received for free into my project such that I never have to concern myself with file system details again. It's a habit that serves me well whether it's in a small program or a large one.