in reply to Correct code for deleting old file?

Welcome markuswinter,

I have modified your code to my style, but the one difference is that I have replaced the '$_' with '$file'. Using '$_' in a one-liner is fine, but when you use it multiple times, you may be changing the value of '$_'. I did not test your code, so it may work fine.

Since you are new to Perl, I'm just suggesting an alternate way to do this. I have changed this style several times in the past 15 years and all because when you have a multi-thousand line script and the compiler tells you that your 'Missing a right curly...' you start looking for ways to improve your style. YMMV.

foreach my $file (@taxonomy_file_url) { if (-e $file) # does the file exist { unlink($file) #delete it } &getFile($file, "taxonomy", $local_taxonomy_directory); }

You don't show 'getFile', but it must be testing for the existence of the file, so you might just remove that code since an 'open' with '>' will over-write the original file.

Good Luck

"Well done is better than well said." - Benjamin Franklin

Replies are listed 'Best First'.
Re^2: Correct code for deleting old file?
by markuswinter (Novice) on Jun 15, 2012 at 10:49 UTC
    Thanks for your help. I know there is a bug in the overall code as the taxonomy file never gets replaced (that's acknowledged by the software company) but as iis an older version they don't fix the bug. The getFile code is
    sub getFile { my $url = shift; my $type = shift; my $download_directory = shift; my($remoteName, $remotePath, $sysCall, $target, $rootName, $curren +t_directory, $matchLineSave, $thisFileName, @listing, $template, $matchName, $matchLine, @hi +story, $i, $key, $value); &checkPath($download_directory); # split download URL into directory and name if ($url =~ /(.*\/)(.+?)$/) { $remotePath = $1; $remoteName = $2; } else { &lastWords("Cannot parse filename from $url"); } if ($url =~ /^http:/i) { if ($remoteName =~ /[\*\?]/) { &lastWords("Wild card filenames not allowed for http downloads +"); } $matchName = $remoteName; } else { # get a directory listing from FTP sites unlink("$download_directory/.listing"); unlink("$download_directory/index.html"); $sysCall = $wget; if ($ENV{'WINDIR'}){ $sysCall =~ s#/#\\#g; } while (($key, $value) = each(%wget_options)) { if (length($value)) { $sysCall .= " $key=$value"; } else { $sysCall .= " $key"; } } $sysCall .= " --dont-remove-listing"; $sysCall .= " --directory-prefix=$download_directory"; $sysCall .= " \"$remotePath\""; if (system($sysCall)) { &lastWords("Could not retrieve directory listing with $sysCall +"); } unless (-e "$download_directory/.listing" && -s "$download_direc +tory/.listing") { &lastWords("Failed to retrieve directory listing with $sysCall +"); } # got a directory listing from remote site, slurp local download . +history into array if (open(HISTORY, "<$download_directory/.history")) { @history = <HISTORY>; close HISTORY; } # make filename into regular expression $template = $remoteName; $template =~ s/(\W)/\\$1/g; $template =~ s/\\\?/\./g; $template =~ s/\\\*/\.\*/g; # work through directory .listing # exit with first match that isn't already in .history $matchName = "no match"; if (open(LISTING, "$download_directory/.listing")) { @listing = <LISTING>; close LISTING; foreach (@listing) { if (/\s+($template)\s*$/ || /\s+($template)\s+->/) { # got a match from .listing $matchName = $1; $matchLine = $_; for ($i = 0; $i <= $#history; $i++) { if ($history[$i] =~ /\s+($matchName)\s*$/ || $history[$i +] =~ /\s+($matchName)\s+->/) { if ($history[$i] eq $matchLine) { # this file already downloaded $matchName = ""; } else { # new version of file available $history[$i] = $matchLine; $matchLineSave = $matchLine; $matchLine = ""; } last; } } if ($matchName && $matchName ne "no match") { last; } } } } # if $matchName is empty, there is no new update # if $matchName is "no match", $remoteName was not found in the di +rectory listing # if $matchLine is empty, no need to append to .history if ($matchName eq "no match") { &lastWords("No match to $url on remote server"); } if ($matchName && $matchLine) { push @history, $matchLine; $matchLineSave = $matchLine; } if ($matchName) { # replace (potentially) wildcard $url with actual download path $url = "$remotePath$matchName"; } else { # nothing to retrieve &message("No update available for $url\n"); return 0; } } # download the file $sysCall = $wget; if ($ENV{'WINDIR'}){ $sysCall =~ s#/#\\#g; } while (($key, $value) = each(%wget_options)) { if (length($value)) { $sysCall .= " $key=$value"; } else { $sysCall .= " $key"; } } # seems like wget cannot cope with --retr-symlinks and --timestampin +g if ($matchLineSave =~ /\s+($matchName)\s*$/) { $sysCall .= " --timestamping"; } else { $sysCall .= " --retr-symlinks"; } $sysCall .= " --directory-prefix=$download_directory"; # Only use --ignore-length option if desperate # if ($url =~ /^http:/i) { # $sysCall .= " --ignore-length"; # } $sysCall .= " \"$url\""; # Try each download twice. # The second attempt will normally not result in a download because +timestamping is on. # The exception is when a file is updated on the remote server durin +g downloading: # wget resumes the download, and ends up with a mixed file of the co +rrect size, # but the file date is that of the original file. # Don't try to do this if the download file is a link if ($matchLineSave =~ /\s+($matchName)\s*$/) { if (system($sysCall)){ &lastWords("Error return from: $sysCall"); } } if (system($sysCall)){ &lastWords("Error return from: $sysCall"); } else { $logEntry .= "Downloaded $url to $download_directory\n"; print "Resting for 10 seconds\n"; sleep 10; } $thisFileName = "$download_directory/$matchName"; # ensure file is writeable chmod 0644, $thisFileName; if ($thisFileName =~ /(.*)\.gz$/i || $thisFileName =~ /(.*)\.Z$/i) + { $rootName = $1; # decompress with forced overwrite $sysCall = "$gzip -df $thisFileName"; if ($ENV{'WINDIR'}){ $sysCall =~ s#/#\\#g; } # try up to 3 times before giving up if (system($sysCall) && system($sysCall) && system($sysCall)){ &lastWords("Error return from: $sysCall"); } else { &message("Expanded $thisFileName\n"); $thisFileName = $rootName; } } if ($type eq "taxonomy") { # taxonomy file may be a tar archive if ($thisFileName =~ /\.tar$/i) { # unpack if tar archive # have to cd to $local_taxonomy_directory because -C option of t +ar unreliable in DOS if ($ENV{'WINDIR'}){ $current_directory = `cd`; $current_directory =~ s#\\#/#g; } else { $current_directory = `pwd`; } chomp($current_directory); if ($local_taxonomy_directory =~ /^(\w:)(.*)/) { chdir($1); chdir($2); } else { chdir($local_taxonomy_directory); } $thisFileName =~ /.*\/(.+?)$/; $sysCall = "$tar -xf $1"; if ($ENV{'WINDIR'}){ $sysCall =~ s#/#\\#g; } # try up to 3 times before giving up if (system($sysCall) && system($sysCall) && system($sysCall)){ &lastWords("Error return from: $sysCall"); } else { &message("Unpacked $thisFileName\n"); } unlink($thisFileName); # cd back to original directory (just in case) if ($current_directory =~ /^(\w:)(.*)/) { chdir($1); chdir($2); } else { chdir($current_directory); } } } elsif ($type eq "unigene") { # nothing more to do for UniGene } else { # sequence database files need to be renamed if ($type eq "fasta") { $target = "$local_incoming_directory/$db_name\_xyzzy.fasta"; } elsif ($type eq "name") { $target = "$local_incoming_directory/$db_name\_xyzzy.nam"; } elsif ($type eq "reference") { # use the existing extension, if any, otherwise use .ref # lose ".complete" from MSDB $thisFileName =~ s/msdb\.ref\.complete\./msdb\.ref\./; if ($thisFileName =~ /.*\.(.+?)$/) { $target = "$local_incoming_directory/$db_name\_xyzzy.".lc($1 +); } else { $target = "$local_incoming_directory/$db_name\_xyzzy.ref"; } } if (move($thisFileName, $target)) { &message("Renamed $thisFileName to $target\n"); } else { &lastWords("Error return from renaming $thisFileName to $targe +t"); } } # if we get here, download must have succeeded, so can now write out # the updated .history, to avoid getting the same file again if (@history) { open(HISTORY, ">$download_directory/.history"); foreach (@history) { print HISTORY $_; } close HISTORY; } # all done return 1; }

      Your code contains lot of bad practices:

      • &function(...). Correct in Perl 4, still working but with really unwanted side effects in Perl 5. Get rid of all ampersands in front of function names, except when you want to make a reference to a function.
      • Two-argument open. Use three-argument open.
      • Lack of error checks, especially after open. Unless you use autodie, add or die "Could not open $filename: $!" after open.
      • Single-argument system() with hand-crafted quoting. Always begs for trouble, because shells tend to have very different quoting rules. At least pass arguments for the invoked program as a list of arguments. See also Re^3: Perl Rename for a little background, http://www.in-ulm.de/~mascheck/various/ for more.

      Also, you seem to go through great pains to handle URLs and to issue HTTP and FTP requests using wget. Perl has LWP, so you do not have to mess with the shell and external programs at all. Look first at LWP::Simple, that should nearly be sufficient to replace wget. If you need more control, look at LWP::UserAgent. As you seem to write a spider, also have a look at LWP::RobotUA.

      Perl also can decompress gzip files and unpack tar files, if you like, even both in one package, Archive::Tar, and of course without having to mess with the shell.

      For URL handling, look at URI and its helpers.

      And for my personal taste, the getFile() function is about 200 lines longer than it should be. Too much code for one function, too deeply nested, too many variables. As a rule of thumb, a function should not exceed one screen, because otherwise, you will get lost in the code. At my university, a screen was defined as a VT420, i.e. 25 lines of no more than 80 characters, but in the real world, my editor shows about 40 lines of about 110 characters. Consider splitting the function into smaller parts.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

        Thanks a million. However that sounds like a complete rewrite would be ideal.

        However I should point out that the scripts aren't mine (my programming expertise is with REALbasic), they are commercial (which might explain why they are so poorly documented) and I have basically no knowledge of Perl, I'm just the poor sucker who got stuck with setting up our new Mascot PC (the guy who did it the last time has long since left).

        Our Mascot version is a few years old, so the database update script doesn't work right with the new databases (there have been some changes), so the script needed updating.

        I managed to do most of the required changes, but there is an acknowledged bug in there in that a certain file (speclist.txt) needs to be replaced, however the download does not replace it, it just adds it to the folder (as speclist1, speclist2, etc). So their solution is that we should delete the file manually before running the script - yeah, right, who is going to remember that as (a) the script is run every month only, and (b) is being called automatically.

        So I try to add a file deletion routine to the script:

        # # delete old speclist # if ( $matchName eq "speclist.txt" ) { if (-e "$local_taxonomy_directory/speclist.txt") { # does the s +peclist file exist unlink("$local_taxonomy_directory/speclist.txt") #delete + it } else { &lastWords("Failed to delete original speclist"); } } #

        but that doesn't seem to work and I don't know why ... yet

        Thanks

        Markus

      markuswinter,

      I took a quick look at the code, and I think your original solution to delete the file may be the better choice. There is still a lot of code that isn't defined and probably compiled in by a 'use' statement.

      Can you test the script in a non-destructive way?

      Usually on PM we try to give hints rather than working code. In your case, it sounds like production code that you want to modify. I strongly suggest you set-up a 'sand box' (testing environment) before you modify production code. If I'm misunderstanding your problem, please correct me.

      Thank you

      "Well done is better than well said." - Benjamin Franklin

Re^2: Correct code for deleting old file?
by GrandFather (Saint) on Jun 18, 2012 at 00:46 UTC

    Anything that relies on Mk I eyeball to ensure braces match is doomed to fail, or at least consume to a great deal of time. It is much better to use a syntax highlighting editor with smart indentation. However any such assertion will raise someone's hackles.

    At the end of the day it generally comes down to what works for you. Note though that often "what works for you" is really "what works of the team" or "what works for the community". In the case of Perl "what works for the community" is K&R indentation with a four space indent.

    Especially for Perl (due to the excellent Perl::Tidy) a good answer can be to use a pretty printer to normalise code for "public" consumption, and even to renormalise code for personal use.

    True laziness is hard work