gziping files on server

filmo has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to automate the archiving of documents that are sent to our server. Each day a diretory is created to catch that day's documents. i.e. /home/mysite/2001-01-01, /home/mysite/2001-01-02', etc

I've set up the following code, but the system call doesn't seem to execute in the right direcotry

while (args here) {
    $new_dir = code to figure out correct path...
    $file = code to determine file to work on...
    $ENV{'PATH'} .= ":/home/mysite/documents/daily/$new_dir";
    system("gzip $file") == 0 or die "can't gzip file";
    $ENV{'PATH'} = gets reset to original path here.
    }
[download]

I've run this and it sets the $ENV{'PATH'} correctly, (I send a sysem("date") call and that runs fine.) What I don't undertand is why the gzip command isn't executing inside of the correctly set PATH. Currenly, it appears to be executing in the directory I'm calling the script from because if I put a file of the same name in the script's directory, it will get gzipped. (i.e. ../cgi-bin/$file ends up ../cgi-bin/$file.gz)
--
Filmo the Klown

Comment on gziping files on server Download Code

Replies are listed 'Best First'.

Re: gziping files on server
by Zaxo (Archbishop) on Jul 17, 2001 at 09:49 UTC

I believe you have misunderstood the use of $ENV{'PATH'}. It is the path searched to find executables by name when no full path is given. When found, the executable runs in $ENV{'PWD'}, the current working directory.

gzip will happily do what you want if you just tell it where to find the file. Adjusting your pseudocode:

while (args here) {
    $new_dir = code to figure out correct path...
    $file = code to determine file to work on...
    $fullpath = "/home/mysite/documents/daily/$new_dir/$file";
    system("gzip $fullpath") == 0 or die "can't gzip file";
}
[download]

After Compline,
Zaxo

[reply]
[d/l]

Re: gziping files on server
by MZSanford (Curate) on Jul 17, 2001 at 13:12 UTC

Zaxo

$ENV{PATH}

$ENV{PWD}

Getting Current Directory

Cwd

    use Cwd;
    $dir1 = cwd();
    $dir2 = getcwd();  # C version of cwd()
    $dir3 = fastcwd(); # less stable , faster C version
[download]

Setting Current Directory

chroot

    my $dir = '/tmp';
    chdir($dir) || die "Failed to cd : $!\n";
[download]

really

[reply]
[d/l]
[select]

Re: gziping files on server
by grinder (Bishop) on Jul 17, 2001 at 14:01 UTC

I've had very good results in terms of performance letting Perl handle the compression via the Compress::Zlib library (it does the crunching in C). Basically, the speed penalty of handling it in perl is offset by avoiding the cost of spawning children (although YMMV). If you use the cpan shell, you no doubt already have the module already installed.

#! /usr/bin/perl -w

use strict;
use Compress::Zlib;

my $file = shift or die "no file on command line.\n";

my( $d, $status ) = deflateInit( {-Level => Z_BEST_COMPRESSION } );
die "deflator construction failed: $status\n" unless $status == Z_OK;

my $deflated;

open IN, $file or die "Cannot open $file for input: $!\n";
while( <IN> ) {
        ($deflated, $status) = $d->deflate( $_ );
        die "deflator deflate failed: $status\n" unless $status == Z_O
+K;
        print $deflated;
}
($deflated, $status) = $d->flush();
die "deflator final flush failed: $status\n" unless $status == Z_OK;
print $deflated;

close IN;
[download]

Note that this script does not produce a zipfile directory, so you can't use gunzip/unzip on it directly; it is just the raw stream. You would decompress the file using an analogue inflate script (examples of how to do this are included in the pod).

This code is sub-optimal in that it reads the code line-by-line, instead of in blocks of 4096 bytes. This was a proof-of-concept demo I hacked up a while ago. I must say in passing though that the Compress::Zlib interface is truly awful.

If space is at a premium on the server, and you have the CPU cycles to spare, you should really be looking at bzip2 instead.

`g r i n d e r`

[reply]
[d/l]

Re: Re: gziping files on server

by scain (Curate) on Jul 17, 2001 at 17:54 UTC

Thanks for bringing bzip2 to my attention. I work with largeish files (400-1000M) which I keep compressed to save space. I don't exactly have cycles to spare, but I wanted to see how much better compression I got with bzip2 over gzip. Surprisingly, bzip gave better compression in about half the time. The main caveat (sp?) here is that the files contain DNA sequence data, so it is similar to, but not the same as, a regular text file.

Here's what I got for my test case of 1 file:

Original file size: 316212340

gzip compressed: 96294342

% of original: 30.5%

gzip CPU seconds: 476

bzip2 compressed: 88270646

% of original: 27.9%

bzip2 CPU seconds: 269

Thanks again,
Scott

[reply]