in reply to Best Practices for Uncompressing/Recompressing Files?

...Right now, I'm doing this all sequentially, one step after another and man is it SLOW...

If it's slow because the disks can't go faster (indicated by using less than near 100% of the CPU), then parallelizing may not be a good idea. This will cause the heads on the disk to have to do still more (very slow) seeks than they already have to do.

The only way to speed up things in this case, is to cause fewer head movements. If the proprietary program can read from STDIN, or write to STDOUT, then a (de-)compressing pipe seems like a good idea. If you have more than one physical disk, try storing the uncompressed file on a disk different from where the compressed file is located.

If possible, add more RAM to the machine (if you can't add a new disk). Having disk blocks in memory reads much faster than having to re-read from disk (just after having been extracted).

If it's slow because of the CPU being used 100%, then you have a problem that cannot be solved apart from throwing more CPU at it.

Liz

Update
Fixed typo. Indeed, we don't want paralyzed disks ;-)

  • Comment on Re: Best Practices for Uncompressing/Recompressing Files?

Replies are listed 'Best First'.
Re: Re: Best Practices for Uncompressing/Recompressing Files?
by Thelonius (Priest) on Aug 11, 2003 at 00:53 UTC
    I want to second this:
    The only way to speed up things in this case, is to cause fewer head movements. If the proprietary program can read from STDIN, or write to STDOUT, then a (de-)compressing pipe seems like a good idea. If you have more than one physical disk, try storing the uncompressed file on a disk different from where the compressed file is located.
    Note that you should not decompress the files in place because you then have to recompress them. You can, e.g. "gunzip <file.gz >/tmp/file"
      Ack bad idea. Hes running on solaris and /tmp is a virt filesystem that dumps to ram/swap switching one evil (disk i/o) for another (memory starvation -- swap out) is not a great idea. As noted in one of his replys above his cpu is near 100% durring the run so it looks like CPU contention.

      -Waswas
        Ack bad idea. Hes running on solaris and /tmp is a virt filesystem that dumps to ram/swap switching one evil (disk i/o) for another (memory starvation -- swap out) is not a great idea. As noted in one of his replys above his cpu is near 100% durring the run so it looks like CPU contention.
        If he can, it's better to use pipes to decompress the file right into the program that will read it. That way you get the best of both worlds. But I doubt that putting the decompressed file on /tmp is a bad idea. That's exactly the kind of file you do want to put on /tmp. It will be used once right after it's created, then destroyed. What else is /tmp for? The fact that Solaris caches it aggressively is good, not bad.

        The key point, however, is:

        VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV
                DO NOT DECOMPRESS THE FILE IN PLACE.
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        
        If you decompress the file in place, you have to recompress it. That wastes over half of your CPU time.
Re: Re: Best Practices for Uncompressing/Recompressing Files?
by mildside (Friar) on Aug 11, 2003 at 00:37 UTC
    ... parallizing may not be a good idea

    I have to agree, if there's one thing you don't want to do, that is to paralyse your disks! :)

    Cheers!