filmo has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to automate the archiving of documents that are sent to our server. Each day a diretory is created to catch that day's documents. i.e. /home/mysite/2001-01-01, /home/mysite/2001-01-02', etc

I've set up the following code, but the system call doesn't seem to execute in the right direcotry

while (args here) { $new_dir = code to figure out correct path... $file = code to determine file to work on... $ENV{'PATH'} .= ":/home/mysite/documents/daily/$new_dir"; system("gzip $file") == 0 or die "can't gzip file"; $ENV{'PATH'} = gets reset to original path here. }
I've run this and it sets the $ENV{'PATH'} correctly, (I send a sysem("date") call and that runs fine.) What I don't undertand is why the gzip command isn't executing inside of the correctly set PATH. Currenly, it appears to be executing in the directory I'm calling the script from because if I put a file of the same name in the script's directory, it will get gzipped. (i.e. ../cgi-bin/$file ends up ../cgi-bin/$file.gz)
--
Filmo the Klown

Replies are listed 'Best First'.
Re: gziping files on server
by Zaxo (Archbishop) on Jul 17, 2001 at 09:49 UTC

    I believe you have misunderstood the use of $ENV{'PATH'}. It is the path searched to find executables by name when no full path is given. When found, the executable runs in $ENV{'PWD'}, the current working directory.

    gzip will happily do what you want if you just tell it where to find the file. Adjusting your pseudocode:

    while (args here) { $new_dir = code to figure out correct path... $file = code to determine file to work on... $fullpath = "/home/mysite/documents/daily/$new_dir/$file"; system("gzip $fullpath") == 0 or die "can't gzip file"; }

    After Compline,
    Zaxo

Re: gziping files on server
by MZSanford (Curate) on Jul 17, 2001 at 13:12 UTC
    As Zaxo said, there seems to be a confussion on $ENV{PATH} and $ENV{PWD}. The full path solution will work for gzip, but in the interest of over-answering questions so we all better understand Perl, I would also like to add a comment on actually changing directory in Perl :

    Getting Current Directory: You can use the Cwd module to get a portable way of getting the current directory. It works like this :
    use Cwd; $dir1 = cwd(); $dir2 = getcwd(); # C version of cwd() $dir3 = fastcwd(); # less stable , faster C version


    Setting Current Directory: Setting the current directory is not usually needed as complete paths would be prefered. In the event that a program being started does need a specific current directory, or if chroot is being used, it can be done as follows :
    my $dir = '/tmp'; chdir($dir) || die "Failed to cd : $!\n";

    Hope that helps soemone.
    OH, a sarcasm detector, that’s really useful
Re: gziping files on server
by grinder (Bishop) on Jul 17, 2001 at 14:01 UTC

    I've had very good results in terms of performance letting Perl handle the compression via the Compress::Zlib library (it does the crunching in C). Basically, the speed penalty of handling it in perl is offset by avoiding the cost of spawning children (although YMMV). If you use the cpan shell, you no doubt already have the module already installed.

    #! /usr/bin/perl -w use strict; use Compress::Zlib; my $file = shift or die "no file on command line.\n"; my( $d, $status ) = deflateInit( {-Level => Z_BEST_COMPRESSION } ); die "deflator construction failed: $status\n" unless $status == Z_OK; my $deflated; open IN, $file or die "Cannot open $file for input: $!\n"; while( <IN> ) { ($deflated, $status) = $d->deflate( $_ ); die "deflator deflate failed: $status\n" unless $status == Z_O +K; print $deflated; } ($deflated, $status) = $d->flush(); die "deflator final flush failed: $status\n" unless $status == Z_OK; print $deflated; close IN;

    Note that this script does not produce a zipfile directory, so you can't use gunzip/unzip on it directly; it is just the raw stream. You would decompress the file using an analogue inflate script (examples of how to do this are included in the pod).

    This code is sub-optimal in that it reads the code line-by-line, instead of in blocks of 4096 bytes. This was a proof-of-concept demo I hacked up a while ago. I must say in passing though that the Compress::Zlib interface is truly awful.

    If space is at a premium on the server, and you have the CPU cycles to spare, you should really be looking at bzip2 instead.


    --
    g r i n d e r
      grinder,

      Thanks for bringing bzip2 to my attention. I work with largeish files (400-1000M) which I keep compressed to save space. I don't exactly have cycles to spare, but I wanted to see how much better compression I got with bzip2 over gzip. Surprisingly, bzip gave better compression in about half the time. The main caveat (sp?) here is that the files contain DNA sequence data, so it is similar to, but not the same as, a regular text file.

      Here's what I got for my test case of 1 file:
      Original file size:316212340
      gzip compressed: 96294342
      % of original:30.5%
      gzip CPU seconds: 476
      bzip2 compressed: 88270646
      % of original:27.9%
      bzip2 CPU seconds:269
      Thanks again,
      Scott