camelcom has asked for the wisdom of the Perl Monks concerning the following question:

Could a kind monk please put my mind at rest re:...

I have to clean-up a filesystem on a box with an old version of perl (5.004) and no extra modules (& I don't want to use find -exec...)

I would normally use File::Find, File::Recurse, File::basename, etc. but I can't, so I'm using opendir, readdir + recursion

Can someone please confirm that the following is ALWAYS reliable (i.e. no gotchas) on a SunOS 5.6 box?...

$file_path =~ /(.*)\/(.*)/; my ($dir, $file) = ($1, $2);

I think I'm being a bit paranoid as the first .* is greedy, but any comments, just in case, please?

Replies are listed 'Best First'.
Re: dir / file split
by svenXY (Deacon) on Sep 28, 2007 at 13:07 UTC
    Hi,
    should be OK...
    $file_path =~ /(.*)\/([^/]+)$/;
    (update: Explanation: the second means: one or more of everything but a slash up to the end of the string.)

    if you are even more paranoid ;-)
    Regards,
    svenXY
      note: for the string "foo/bar/" the OP's RE will match ("foo/bar",""), but your will not match at all.
      Also there's no need to worry about greedyness in the OP case: the backtracking for "foo/bar/cuz" will start tring to match "foo/bar/cuz" then "foo/bar/cu", "foo/bar/c" and so on till "foo/bar", which allow to finish correctly the RE in that order.
      use re 'debug'; could help

      Oha

      ...have updated my script - thanks Sven
Re: dir / file split
by perlfan (Parson) on Sep 28, 2007 at 13:16 UTC
    This may help find any Solaris gotchas (perlsolaris for 5.004 - I think).

    Otherwise, might you be able to use the system's find command? That looks like a scary pattern to be determining files to delete :).

    Portability of glob function in a modern perl may help, too. Can you be more specific about how you are selecting files to "clean"?
      I'm using MJD's dir_walk function from Higher Order Perl and the root dir for the clean is a very specific area of the filesystem: the regex shown is only being used to extract the filename from the full filepath returned by readdir... Thanks!
        Interesting. I've that book for a while, and decided to pick it up last night...that was actually the application I was perusing.
Re: dir / file split
by grinder (Bishop) on Sep 28, 2007 at 14:46 UTC

    File::Basename and File::Find were released with perl 5.000. If your 5.004 installation is so trashed, you could always download the tarball, pull out the corresponding .pm files and stick them in a subdirectory named 'File'. They're pure-Perl.

    Then you just have to use lib '.' and you're home free.

    However, if you are brave and insist on the original approach...

    I think I'm being a bit paranoid as the first .* is greedy

    No, to not do so would possibly be incorrect. You could anchor the end of the match with '$' to help settle the issue for the engine though, and make the latter match non-greedy.

    Also be aware of symlinks. If you have a symlink that points back to an ancestor directory, your script will run until the heat death of the universe, or until your machine runs out of swap, whichever comes first.

    • another intruder with the mooring in the heart of the Perl

      I have incorporated all suggestions, including the symlink check:

      $file_path =~ /(.*)\/([^\/]+?$)/; my ($dir, $file) = ($1, $2);

      MANY THANKS to all

        I have incorporated all suggestions

        Except the different delimiter :)

Re: dir / file split
by johngg (Canon) on Sep 28, 2007 at 14:29 UTC
    It can aid readability when matching *nix paths to choose a different regex delimiter.

    $file_path =~ m{(.*)/(.*)};

    Cheers,

    JohnGG

Re: dir / file split
by girarde (Hermit) on Sep 29, 2007 at 04:21 UTC
    This has a belt-and-suspenders quality, due to $2 explicitly excluding the directory delimiter:

    $file_path =~ /(.*)\/([^/]*)/; my ($dir, $file) = ($1, $2);

    It should satisfy all reasonable paranoia.

Re: dir / file split
by sanPerl (Friar) on Sep 29, 2007 at 17:35 UTC
    You script seems ok to me. Just a suggestion
    $file_path =~ /(.*)\/(.+)/; my ($dir, $file) = ($1, $2);
    This will ensure that you are always hitting with the file name so any string like /foo/bar/abc/ would be ignored.