fasoli has asked for the wisdom of the Perl Monks concerning the following question:

Hi Wise Monks!

Happy 2016!

I have a question regarding the "touch" function in Perl as I think that it may be suitable for my case. EDIT: Sorry, I forgot to add that I haven't succeeded in using the File::Touch function as I get an error message when I try to run the script (can't locate in @inc), so I will use Unix's touch command.

I am writing a script for data handling that will loop through a variety of files and do a variety of things. What I'm stuck with now is the following: I want to be able to check what step of my calculations is completed, the step is written in the filename. So let's say that the latest file I have is step S00005. This means I'll want to check if the previous step, S00004, exists (I think by using if -e), and then, if it exists, remove S00004 and prepare a new input file that has the number ST006. The idea is that the "previous step" will always exists as a backup.

I have read this page about touch http://metacpan.org/pod/distribution/ppt/bin/touch

#/bin/perl/ use strict; use warnings; my $results = "~/results"; my $unpacked = "~/results/unpacked"; my $submit = "~/submit"; my @files; @files = `ls *$txt`; my $filename; my $seq; my $nr; my $mol; my $dir; my $step; my $type; open (my $finished, '>>', "finished.txt"); printf $finished ("%-5s%12s%14s","Molecule","Structure","Step"); printf $finished "\n"; close $finished; foreach (@files) { /(prefix)(_)(\d+)(_)(\d+)(_)(\w+)(_)(\w+)(_)(\w+)(\.)(\w+)/; $filename = "$1$2$3$4$5$6$7$8$9$10$11$12$13"; print "name of file: $filename \n"; print "\n"; $seq= $3; $nr = $5; $mol = $7; $dir = $9; $step = $11; $type = $13; open (my $finished, '>>', "finished.txt"); printf $finished ("%-5s%15s%15s",$mol,$dir,$step); printf $finished "\n"; close $finished; my $check = 0; $check = `ls | grep -c $mol\_$dir\_step\.$out`; print "$mol\_$dir\_$step\.$out: $check \n"; if ($check == 0) { open my $error, '>>', 'error.txt'; print $error "WARNING $mol\_$dir\_$step\.$out is missing \n"; close $error; } else { open my $found, '>>', 'found.txt'; printf $found "$mol\_$dir\_$step\.$out is found \n"; close $found; } $check = 0; print `rsync -ravn \*$mol\_$dir\_$step\.$out ~/results/unpacked/$ +mol/$dir`; ###this is where my problems start my @step = ( step( 'S', 6, 1, 5 ), ); for $step (@step) { print `touch $step`; } sub step { my ( $name, $total_length, $from, $to ) = @_;# my $length = $total_length - length($name); my $format = "${name}%0${length}d"; return map sprintf( $format, $_ ), $from .. $to; } if (`ls -c *$step*`) { print "$step is found \n" } else { print "$mdstep not found \n" } }

So what I've done so far is that I check for the relevant files that exist, print error/found messages and copy the files over to the directory where I will process them.

Then, at the "this is where my problems start" bit is where I have mixed everything up. What I want to do is tell Perl to look for the latest (hence the, hopefully correct, ls -c) file number in the S00001 or S00002 or S00003 and so on files. So if the latest one is S00003, I want to check if S00002 exists. If it exists, I want to remove it and touch S00004.

In my code above, I tried to implement what a Monk here taught me (the subroutine use and format) but I haven't implemented it correctly/usefully. I only wanted to test how the touch function is used and that's why I tried it out. But at this point it's not useful. What is useful, is manage to use ls -c properly.

So what mostly confuses me is what to use instead of the variable $step in `ls -c *$step*` as this just lists all of them. How do I do "ls -c S00010" for example, if I knew that the latest file I had was file 10???

I would honestly appreciate any help and hints and resources as I've been stuck for hours now.

Thank you all in advance and I'm sorry if my message is confusing (especially the subroutine bit that I only tried to see how touch works), if people feel that my message is too confusing I'll try to edit it.

Replies are listed 'Best First'.
Re: check for latest file, remove and touch file?
by 1nickt (Canon) on Jan 05, 2016 at 15:14 UTC

    I'll want to check if the previous step, S00004, exists (I think by using if -e)

    Many people use -f in preference to -e as the former tests not only that the file exists but that it is a plain file.

    See the documentation for Perl file test operators, which may help you solve some of your other issues too.

    Hope this helps!


    The way forward always starts with a minimal test.

      Thank you for the recommendation and the link!

Re: check for latest file, remove and touch file?
by mr_mischief (Monsignor) on Jan 05, 2016 at 15:31 UTC

    This seems overcomplicated.

    Do you want the "latest file" as per the filesystem or the highest-sequence file as per your naming convention? Is the inode change time important? The modification time? Is it just the filename? What if the files get out of sync and you end up having S0007 but not S0006? Do you want to still delete S0005 if it exists? Why not sort your files, pop the two newest off the array, then delete the rest if making sure you have the two newest available is what you're actually trying to do?

    You should learn to use Perl modules, but if you're in a hurry what does the Unix touch command buy you here that open and close don't?

    sub my_touch { my $f = shift; open my $fh, '>>', $f or die "Can't write to $f: $!\n"; close $fh; }

    Why are you shelling out to ls when you have opendir and readdir or glob?

    Also, I'm not going to recommend a formatting standard because that's kind of your own choice, but I do recommend you choose one and stick to it. It will be easier to pick one that is fairly common.

    Update: changed the mode of the open in the example code to append per the suggestion by AnomalousMonk.

      open my $fh, '>', $f or die "Can't write to $f: $!\n";

      Won't open-ing the file named  $f in  '>' write mode clobber the current contents of the file at the same time it changes time/date? Wouldn't  '>>' append mode be better?


      Give a man a fish:  <%-{-{-{-<

        Assuming the file already exists, yes append mode would be better. If it doesn't already exist, then the two are equivalent. To make it more similar to touch it should be appending. You're right about that.

        At least in my environment (AIX), opening the file in append mode without adding any content does not update the modification date of the file.

        $: date Thu Jan 7 09:37:01 CST 2016 $: touch step.file $: ls -l step.file -rw-r--r-- 1 wlsedi wlsedi 0 Jan 07 09:37 step.file $: date Thu Jan 7 09:38:01 CST 2016 $: touch step.file $: ls -l step.file -rw-r--r-- 1 wlsedi wlsedi 0 Jan 07 09:38 step.file $: date Thu Jan 7 09:39:38 CST 2016 $: perl -de 1 Loading DB routines from perl5db.pl version 1.28 Editor support available. Enter h or `h h' for help, or `man perldebug' for more help. main::(-e:1): 1 DB<1> open $fh,'>>','step.file' DB<2> close $fh DB<3> q $: ls -l step.file -rw-r--r-- 1 wlsedi wlsedi 0 Jan 07 09:38 step.file

        Opening in write does have the un-touchlike side effect of wiping out any content. In this particular instance, that is of no consequence, but using the code as a generic replacement for touch is problematic.

        But God demonstrates His own love toward us, in that while we were yet sinners, Christ died for us. Romans 5:8 (NASB)

      "Do you want the "latest file" as per the filesystem or the highest-sequence file as per your naming convention? Is the inode change time important? The modification time? Is it just the filename? What if the files get out of sync and you end up having S0007 but not S0006? Do you want to still delete S0005 if it exists? Why not sort your files, pop the two newest off the array, then delete the rest if making sure you have the two newest available is what you're actually trying to do?"

      Sorting the files and popping the two newest off the array does sound like a good idea but I'll have to think about it and see if it works for my case. Thank you for the recommendation.

      I thought about this and I think the best way to proceed in my case will be to look for a file with the biggest number (S0007 over S0005 for example), then look if the previous one is there, remove the previous one and proceed to create the next one. So if 7 is there, check if 6 is there, delete 6 and proceed to create 8. I'll have to think about the rest of my script and read up a tutorial to figure out my other issues, but if I can get past the thing that I just described it would be great. Can someone help me regarding how I can do that within my $step variable? How can I say "if we are at step x, check step x-1 exists and create step x+1"?

        Can someone help me regarding how I can do that within my $step variable? How can I say "if we are at step x, check step x-1 exists and create step x+1"?

        At this point you really should step back from your thinking on how to solve your "problem" and reexamine the overall task. Please say what you're really trying to do (beyond manipulating filenames), and especially what data you want to have available as your "backup."

        It's possible that your whole premise "a script for data handling that will loop through a variety of files and do a variety of things" could be misguided.

        The way forward always starts with a minimal test.

      Thank you very much for your comment. I'll have to study it a bit but quickly let me ask you what you mean with sticking to a formatting standard? What formatting are you referring to? Sorry, I'm really a beginner, as if it's not obvious.

Re: check for latest file, remove and touch file?
by ww (Archbishop) on Jan 05, 2016 at 15:33 UTC

    "I haven't succeeded in using the File::Touch function as I get an error message when I try to run the script (can't locate in @inc)...."

    A module must be installed to use it. Did you? The error message suggests that you did not. Super Searching for install instructions (for various OSen) will show you the simple steps to install using cpan or a build process.

    Here's a sample of Super Search   results.

      I know it looks like it's not installed or something else is wrong, but this is a shared computer and before I modify stuff it's better to ask the admin, also whenever I tried installing things I needed to be given permissions or sudo rights or something, which isn't doable right now. But I'll ask in a couple of days when the admins are back.

Re: check for latest file, remove and touch file?
by GotToBTru (Prior) on Jan 05, 2016 at 16:43 UTC
    foreach (@files) { /(prefix)(_)(\d+)(_)(\d+)(_)(\w+)(_)(\w+)(_)(\w+)(\.)(\w+)/; $filename = "$1$2$3$4$5$6$7$8$9$10$11$12$13"; print "name of file: $filename \n"; print "\n";<c>

    That's ugly. You've been previously shown a much more compact and clear way to do this. Okay, we'll move on to the current question, but your failure to incorporate previous lessons does not bode well for future enlightenment.

    use strict; use warnings; my $counter = 0; # get last step completed, if any my $file = glob("S*.txt"); if (defined $file) { ($counter) = $file =~ m/S(\d+)\.txt/; } printf "Ready to start step %d\n",++$counter; if ($counter == 1) { print "Just getting started.\n" } elsif ($counter > 1 && $counter <= 4) { print "All kinds of interesting intermediate processing goes on here +.\n" } elsif ($counter == 5) { print "Finishing up.\n" } print "Completed step $counter.\n"; stepfile($counter); # forget previous step, if any, now that this one is done unlink $file if (defined $file); sub stepfile { my $step = shift; my $filename = sprintf "S%06d.txt",$step; open my $tfh,'>',$filename; }

    This differs some from the approach in your code, which creates all 5 step files first. This code looks for a single file containing the last completed step number to determine what step do next, does it, and when finished creates a file showing how far it got, and removes the last file.

    But God demonstrates His own love toward us, in that while we were yet sinners, Christ died for us. Romans 5:8 (NASB)

      Hi and thanks for your time and effort, I really appreciate it. Your answer seems to do what I need in this case and is straightforward - very good for a newbie. I should actually delete the bit where I use the subroutine to create 5 step files, as it's confusing - it doesn't serve any purpose in my script, it's just my attempt to use "touch" and not forget how I did it. I will try your code now and thanks again so much :)

      I had tried implementing a file counter but I can see now what I was doing wrong, I didn't think about using elsif and I got lost in continuous elses and ifs and failed to do it properly.

      Regarding your comment about the ugly bit in my code, I completely agree. The problem is that I'm a coding newbie and the project I'm working on needs enormous scripts that I try to write myself, although I've never studied anything related to computers. I'm extremely pressured for time (and I won't mention my stress levels, lol) and although I did prefer your solution, it would mean that I would have to go and read on what glob does and I simply don't have the time to do it, and in this case it's ok since my ugly solution worked. However I had already printed and included your recommendation in my perl folder, where I keep corrections and solutions for future use, maybe for a future project. My supervisors just wouldn't be happy if I went back to make stuff prettier - as long as they work it's ok and I have to move forward, as I have deadlines to keep all the time :(

      Again thank you very much for your time and answer, it looks really helpful.

      I have tried out your recommendation and it works perfectly. The only thing I edited was the "my $filename = sprintf "S%05d.gro",$mdstep;" at the end, so that the step filename is always made up from 6 characters in total, including the S in the beginning.

      I have a few questions though, the biggest of which comes from the fact that I started this thread with a panicky confusing message. My total steps aren't 5, in fact we don't know how many we will be needing to do, so it can be any number. So the numbers you have used, 4 and 5, are great for me to check the example but I have to now ask if you can explain how to write this bit having in mind that we don't know how many steps we'll do.

      In other scripts I've written, I had done something like this when I needed to use a counter:

      my $start; my $total; for ($start=0; $start<=$total; $start++) { #do stuff in the loop }

      Does this make sense if $total is undefined? Of course if $total is undefined and the total number of steps is unknown, this bit...

      elsif ($counter == 5) { print "Finishing up.\n" }

      ...is redundant, right?

      Again thank you very much for your help, it's greatly appreciated :)

        The for loop is used when the number of repetitions is known at the very beginning of the loop. If that number isn't known, you will want to use a while or until loop, which will repeat until a condition is met.

        But God demonstrates His own love toward us, in that while we were yet sinners, Christ died for us. Romans 5:8 (NASB)

Re: check for latest file, remove and touch file?
by u65 (Chaplain) on Jan 13, 2016 at 15:30 UTC

    As an aside to the other comments, this line in your example code (the "shebang" line):

    #/bin/perl/

    should read:

    #!/bin/perl

    Note the exclamation mark after the hash mark, and the forward-slash mark is removed. That assumes your perl executable (or a link to it) is actually '/bin/perl'.