fasoli has asked for the wisdom of the Perl Monks concerning the following question:

Hi Wise Monks!

I'm trying to write a script that will loop through a bunch of files and redistribute them into a "directory tree". As you can see from my code, I match some useful things in the files' names, such as the variables $mol and $dir that I use later to create the directories where I will move my files.

Directory $mol is always the same and will in the end contain lots of different $dir directories. So that I make this a bit clear, all these files belong to the same pattern, let's say $mol = alcohol, but have multiple $dir values, such as primary, secondary, tertiary, and so on.

My code works, although I had to use option -p on mkdir otherwise I got the warning that directories existed. Naturally, when I loop through the files and start creating the directory tree, directory $mol exists after going through the loop for the first time and matching the first file, so the second time my script matches a file, this time by a different $dir value, mkdir complains that directory $mol exists. This was solved by using option -p, for which the man page says: -p, --parents: no error if existing, make parent directories as needed.

Now my concern is, is there a chance that mkdir may overwrite a directory, thus leading to loss of data? I've tried testing it by running the script multiple times and throwing in ~/folder1/$mol various test directories and test files. I've also put some test files into some of the ~/folder1/$mol/$dir directories and then ran the script again, after adding more and new $txt files to loop through - the test files are intact and more $dir directories are created, as intended.

However I'm getting stressed that I haven't tested enough or that there's something I'm missing - can you help? I've also looked at the mkdir man page and generally googled "can mkdir overwrite directories" and it seems that it can't. But another opinion would also be useful. Thank you all and happy holidays! :)

#/bin/perl/ use strict; use warnings; my $txt = "txt"; my @files; @files = `ls *$txt`; my $filename; my $seq; my $nr; my $mol; my $dir; my $step; my $type; foreach (@files) { /(prefix)(_)(\d+)(_)(\d+)(_)(\w+)(_)(\w+)(_)(\w+)(\.)(\w+)/; $filename = "$1$2$3$4$5$6$7$8$9$10$11$12$13"; print "name of file: $filename \n"; print "\n"; $seq= $3; $nr = $5; $mol = $7; $dir = $9; $step = $11; $type = $13; print `mkdir -p ~/folder1/$mol/$dir`; }

Replies are listed 'Best First'.
Re: mkdir in loop - overwrite danger?
by GotToBTru (Prior) on Dec 22, 2015 at 20:44 UTC

    A bit more Perlish:

    #!/bin/perl use strict; use warnings; use File::Path qw(make_path); my $txt = "txt"; my @files = glob("*$txt"); foreach my $filename (@files) { my ($seq, $nr, $mol, $dir, $step, $type) = (split /[._]/, $filename)[1..6]; print "name of file: $filename \n\n"; make_path("~/folder1/$mol/$dir"); }

    Update: corrected typo (thanks poj!). Updated again to use glob instead of ls.

    Dum Spiro Spero
      A bit more Perlish:
      my $txt = "txt"; my @files = `ls *$txt`;

      No need to spawn ls and hope that the shell does not ruin everything. glob can do that in pure perl:

      my $txt='txt'; my @files=glob "*$txt";

      Another way for this simple case (no subdirectories) would be a combination of opendir, readdir, closedir, and grep:

      my $txt='txt'; opendir my $d,'.' or die "Can't opendir .: $!"; my @files=grep /\Q$txt\E$/,readdir $d; closedir $d;

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

        I did consider that, but got lazy; replacing ls with glob meant also having to split off the basename of the file from the path. I liked the simplicity.

        Update: having looked at this again, I see glob returning file names only if that path is not included. That makes for a more portable solution.

        But God demonstrates His own love toward us, in that while we were yet sinners, Christ died for us. Romans 5:8 (NASB)
Re: mkdir in loop - overwrite danger?
by RichardK (Parson) on Dec 22, 2015 at 18:41 UTC

    Don't worry, the help for mkdir (on Linux anyway!) says :-

    Create the DIRECTORY(ies), if they do not already exist.

    You might want to use File::Path instead, as make_path returns the number of directories created.

Re: mkdir in loop - overwrite danger?
by graff (Chancellor) on Dec 22, 2015 at 23:09 UTC
    Rest assured that all implementations of creating a directory share the property that if the directory already exists, nothing happens to it or its contents. (Different implementations might have different kinds of "failure" behaviors when this condition comes up, but they'll all leave an existing directory as-is.)

    Now, if a directory contains a data file with a particular name, and you mistakenly do something that involves overwriting or replacing the contents of the file with something you don't really want -- e.g. rename($badfile,$goodfile); -- the damage will "succeed", of course.

Re: mkdir in loop - overwrite danger?
by Anonymous Monk on Dec 22, 2015 at 18:37 UTC

    I have no comment on "mkdir -p".

    If the match fails in loop, you would have bugs wherever you have used the captured variables for you don't skip that condition. Also, why do you regenerate file name when you already have it as the loop variable?

Re: mkdir in loop - overwrite danger?
by Marshall (Canon) on Dec 23, 2015 at 08:59 UTC
    I am curious as to why you want to move the desired files? What purpose does that serve? Perhaps that is because there are a lot of "junk files"? I don't know.

    A simple database of these relevant file names could be just what you need? I would need to know more about what the usage is.

    A serious thing to consider is what happens when some file name is deleted as *text and how that affects your directory structure? Your program should have the property that it produces a valid result again and again even if some "line" is missing from the next run...

    As an extra note...there is no need at all to use these $1, $2,$3 variables in almost all cases in Perl. There are of course exceptions...

    my ($seq,$nr,...) =( /prefix_(\d+)_(\d+)_(\w+)(_)(\w+)(_)(\w+)(\.)(\w+)/)[1,2,3...];
    This is called "array slice" and is powerful.

    I figure that your regex is flawed. Do not use () to capture any thing that is not needed. These $1,$2,$3 variables are expensive. This \w+ then followed by "_" can also be "expensive". The underscore is part of the \w character set, a-zA-Z0-9_-.