in reply to Best Practices for Uncompressing/Recompressing Files?
Here's what I would do.
First, write your program to take a list of filenames on the command-line, process all of them, then exit. This gives you maximum flexibility in calling the script.
Second, install GNU xargs and find if you don't already have them. Linux/BSD will come with these; commercial Unices won't.
Now you have everything you need to parallelize this process. Simply use the -P flag to GNU xargs:
find . -type f -print0 |xargs -0 -P 4will start up to 4 copies of your program in parallel, feeding each of them as long a list of files as will work. When one batch finishes, another will be started with another batch.
find . -type f -print0 |xargs -0 -n 1 -P 6will start up 6 copies of your program in parallel, processing one file each. When one copy finishes, the next will be started. You can vary this process and experiment by writing the file list to another file, then processing chunks of this. If your filenames don't have spaces in them, you can use simple tools like split, head, and tail to do this; otherwise you'll have to write short Perl scripts to deal with a null-terminated list of files.
I would also consider using pipes and/or Compress::Zlib to minimize disk I/O. If you're decompressing to a temp file, then converting this and writing to another file, then compressing the written file, you're effectively writing the file to disk twice uncompressed, and once compressed. Further, while the blocks should mostly be in your buffer cache so not actually read from disk, the different copies of the file are wasting memory with multiple copies of the same file. If you could turn this into something like:
gunzip -c <file.gz |converter |gzip -c >newfile.gz mv newfile.gz file.gzyou would only write the file to disk once compressed, and never uncompressed. This should save you tons of I/O and buffer cache memory (although, as always, YMMV and you should benchmark to see for sure).
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: Best Practices for Uncompressing/Recompressing Files?
by waswas-fng (Curate) on Aug 11, 2003 at 07:25 UTC | |
by sgifford (Prior) on Aug 11, 2003 at 07:45 UTC | |
by waswas-fng (Curate) on Aug 11, 2003 at 13:05 UTC |