in reply to Re: Best Practices for Uncompressing/Recompressing Files?
in thread Best Practices for Uncompressing/Recompressing Files?

I want to second this:
The only way to speed up things in this case, is to cause fewer head movements. If the proprietary program can read from STDIN, or write to STDOUT, then a (de-)compressing pipe seems like a good idea. If you have more than one physical disk, try storing the uncompressed file on a disk different from where the compressed file is located.
Note that you should not decompress the files in place because you then have to recompress them. You can, e.g. "gunzip <file.gz >/tmp/file"
  • Comment on Re: Re: Best Practices for Uncompressing/Recompressing Files?

Replies are listed 'Best First'.
Re: Re: Re: Best Practices for Uncompressing/Recompressing Files?
by waswas-fng (Curate) on Aug 11, 2003 at 03:33 UTC
    Ack bad idea. Hes running on solaris and /tmp is a virt filesystem that dumps to ram/swap switching one evil (disk i/o) for another (memory starvation -- swap out) is not a great idea. As noted in one of his replys above his cpu is near 100% durring the run so it looks like CPU contention.

    -Waswas
      Ack bad idea. Hes running on solaris and /tmp is a virt filesystem that dumps to ram/swap switching one evil (disk i/o) for another (memory starvation -- swap out) is not a great idea. As noted in one of his replys above his cpu is near 100% durring the run so it looks like CPU contention.
      If he can, it's better to use pipes to decompress the file right into the program that will read it. That way you get the best of both worlds. But I doubt that putting the decompressed file on /tmp is a bad idea. That's exactly the kind of file you do want to put on /tmp. It will be used once right after it's created, then destroyed. What else is /tmp for? The fact that Solaris caches it aggressively is good, not bad.

      The key point, however, is:

      VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV
              DO NOT DECOMPRESS THE FILE IN PLACE.
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      
      If you decompress the file in place, you have to recompress it. That wastes over half of your CPU time.
        two points:

        reread the question i think you missed the last two requirements:
      • reformat each uncompressed file using a proprietary program over which I have no control
      • recompress each file so that our disk doesn't run out of space


      • /tmp is not a caching filesystem in solaris it is a direct drop to RAM/SWAP -- you fill it and the RAM is starved and swap will start thrashing (read *bad*). Solaris agressivly caches on other partitions which will not starve memory.

        -Waswas
        Filling up /tmp on Solaris is bad. The system starts thrashing. I've seen a server or two come to a grinding halt by someone vi'ing a large file.

        thor