Bandwidth and disk space are normally metered and charged for, and CPU usage is normally not. We can "exploit" this to lower hosting costs. Warning: In some cases, hosting companies will limit CPU cycles or flat out ask you to leave if you go above their usage expectations. In shared hosting cases, bear in mind that while there is only a small chance you can affect another user on the server by using too much disk space or bandwidth, increased CPU usage will slow down the server for everyone. Be a good neighbor and client, keep an eye on server load while attempting to trade your byte overage for cpu cycles. Code on this page is meant as examples only, but if you detect a "bug" please let me know.

Compression tests

You probably know that browsers determine what type of document they are receiving by the Content-Type header, but the header that interests us here is the Content-Encoding one. Most browsers accept gzip encoding, some accept compress and deflate as well. With the help of Compress::Zlib we can make a few tests:

Lab test 1.1 - run from browser, test for gzip encoding compatibility
#! /usr/bin/perl -w use strict; use Compress::Zlib; print "Content-Type: text/plain\n"; $ENV{HTTP_ACCEPT_ENCODING} ||= ''; if ( $ENV{HTTP_ACCEPT_ENCODING} =~ /gz/ ) { print "Content-encoding: gzip\n\n"; my $gz = gzopen ( \*STDOUT, "wb" ); $gz->gzwrite ( "I'm compressed" ); } else { print "\n"; print "I'm not compressed\n"; } __END__
Nice, but is it worth it?

Lab test 1.2 - run from shell, test savings
#! /usr/bin/perl use LWP::Simple; use Compress::Zlib; $a = get( 'http://www.perlmonks.com' ); $b = Compress::Zlib::memGzip( $a ); print "uncompressed: ", length( $a ), "\n"; print "compressed: ", length ($b), "\n"; __END__
Change the URL and check for other sites/pages. It is REALLY worth it if you are being charged for bandwidth, but it is also worth it in the sense that your visitors get the page faster and your httpd processes (threads in NT) are tied up for less time. Note that images are normally already compressed (PNG, JPEG, GIF), and the overhead of (un)compressing them is not worth the savings.

For static pages, you have 3 options to serve compressed HTML documents.

1. keep pages compressed and uncompress them on the fly if browser does not accept gzip encoding:

This will not only save bandwidth but also disk space. The question you need to ask yourself if you go this route is "should links on pages be page.html.gz or page.html?". Although .html sounds better even if the content is encoded, some browsers (IE: older IE versions) might expect a plain encoding due to the file extension, no matter what your web server says.

In both cases, you will need a cgi that uncompresses content, the shell script below should do the trick with minimum fuss.
#!/bin/sh 
echo "Content-type: text/html 

" 
exec /usr/bin/zcat -f "${DOCUMENT_ROOT}${REDIRECT_URL}"

And here is a half cooked recipe to use with mod_rewrite that will work for calling your CGI on the fly if the client doesn't accept gzip encoding. The rule below will work if you have your compressed html files named file.html, instead of file.html.gz
RewriteEngine On 
RewriteOptions Inherit 

ReWriteCond %{HTTP:accept-encoding} !(gzip.*) 
ReWriteCond %{REQUEST_FILENAME} ^.+\.htm* 
ReWriteRule . /usr/www/users/MYNAME/cgi-bin/gunzip.cgi 
AddEncoding x-gzip htm html

2. keep 2 copies of the document, one compressed (file.html.gz) one umcompressed (file.html):

You will need for apache to determine which files to send to which user agents, this recipe in your .htaccess will work:
RewriteCond %{HTTP:accept-encoding} gzip RewriteCond %{REQUEST_FILENAME}.gz -f RewriteRule ^(.+)$ $1.gz [L]
Thats it, no CGI required. The only think you need now is to make a compressed copy of the files you want to serve compressed. A small perl script using File::Find and Compress::Zlib will probably do the job nicely, just remember to keep the compressed and uncompressed files identical.

3. keep pages uncompressed and compress them on the fly:

This is almost the reverse of option 1, except that since most browsers accept gzip encoding most of your http requests will carry the compressing overhead, which can be significant. You will need an Apache handler (see mod_gzip) or a CGI to do the dirty work of compressing the document and you will not have to worry about keeping two copies of the same document laying around.

Update
aceshardware.com on this topic:
http://www.aceshardware.com/read.jsp?id=45000244 (second half of page)

Replies are listed 'Best First'.
Re: Lower hosting costs with compression
by ajt (Prior) on Oct 23, 2001 at 12:32 UTC
    Very interesting points here. I've used mod_gzip on Apache for a while, and if you have lots of text documents, it does do a nice job of compressing them. Once your site gets graphical, or you have lots of already compressed files, then it does get less efficient.

    You can also have mod_gzip compress output from your CGI scripts too, which is very clever. I've not tested gzip extensivley, but it seems to be reliable, I never had any problems with it.

    Interstingly enough if Apache detects that a file is already present in both gzip and raw, then it aborts any inline compression, and sends the existing gzip file automatically. I don't think that the rewrite rule is required, it just happend automatically if you have Multiviews on (I need to check that).

    As ever merlyn has a column on inline compression at his site.

      I read just a few documents on multiviews and content negotiation provided by mod_negotiation, but couldn't get it to "prefer" serving .gz encoding if an uncompressed file was present. My wrong maybe, I found "LanguagePriority" but no "EncodingPriority" ?

      From other documents around the web I got the impression it was mainly for languages, where you can have a file.html.eg and a file.html.fr for English and French. A quote from "<A href=http://httpd.apache.org/docs-2.0/misc/perf-tuning.html">Apache performance notes" on the apache.org site says "If at all possible, avoid content-negotiation", and with these two problems I turned to the rewrite rule.

      merlyns column uses gzip for content-type.

      Tiago
Re: Lower hosting costs with compression
by cLive ;-) (Prior) on Oct 23, 2001 at 09:15 UTC
    Hmmm.... fun, but generally irrelevent :)

    Eg, we give clients 50Mb, and have yet to see one get anywhere near that on text only.

    Media tends to be the space hogger, and since jpegs etc are already compressed, there's not much you can do about it once the images are online.

    I think the best way to compress sites is to whack all gifs through an optimiser, and save jpegs at an acceptable compression rate.

    Doing that will save you more space.

    But a fun project, I'm sure :)

    .02

    cLive ;-)

      fun, but generally irrelevent

      I disagree, In my case, serving gz encoded HTML saved me about $100 a month. For smaller or less active sites it could be somewhat irrelevant as far as costs go but still, the user gets the HTML page faster, which is not bad, and the HTTPD process finishes faster, which is good for the server.

      we give clients 50Mb ...

      That would be enought for about 10 of my apache logfiles (compressed).

      Tiago