Bandwidth and disk space are normally metered and charged for, and CPU usage is normally not. We can "exploit" this to lower hosting costs. Warning: In some cases, hosting companies will limit CPU cycles or flat out ask you to leave if you go above their usage expectations. In shared hosting cases, bear in mind that while there is only a small chance you can affect another user on the server by using too much disk space or bandwidth, increased CPU usage will slow down the server for everyone. Be a good neighbor and client, keep an eye on server load while attempting to trade your byte overage for cpu cycles. Code on this page is meant as examples only, but if you detect a "bug" please let me know.

Compression tests

You probably know that browsers determine what type of document they are receiving by the Content-Type header, but the header that interests us here is the Content-Encoding one. Most browsers accept gzip encoding, some accept compress and deflate as well. With the help of Compress::Zlib we can make a few tests:

Lab test 1.1 - run from browser, test for gzip encoding compatibility
#! /usr/bin/perl -w use strict; use Compress::Zlib; print "Content-Type: text/plain\n"; $ENV{HTTP_ACCEPT_ENCODING} ||= ''; if ( $ENV{HTTP_ACCEPT_ENCODING} =~ /gz/ ) { print "Content-encoding: gzip\n\n"; my $gz = gzopen ( \*STDOUT, "wb" ); $gz->gzwrite ( "I'm compressed" ); } else { print "\n"; print "I'm not compressed\n"; } __END__
Nice, but is it worth it?

Lab test 1.2 - run from shell, test savings
#! /usr/bin/perl use LWP::Simple; use Compress::Zlib; $a = get( 'http://www.perlmonks.com' ); $b = Compress::Zlib::memGzip( $a ); print "uncompressed: ", length( $a ), "\n"; print "compressed: ", length ($b), "\n"; __END__
Change the URL and check for other sites/pages. It is REALLY worth it if you are being charged for bandwidth, but it is also worth it in the sense that your visitors get the page faster and your httpd processes (threads in NT) are tied up for less time. Note that images are normally already compressed (PNG, JPEG, GIF), and the overhead of (un)compressing them is not worth the savings.

For static pages, you have 3 options to serve compressed HTML documents.

1. keep pages compressed and uncompress them on the fly if browser does not accept gzip encoding:

This will not only save bandwidth but also disk space. The question you need to ask yourself if you go this route is "should links on pages be page.html.gz or page.html?". Although .html sounds better even if the content is encoded, some browsers (IE: older IE versions) might expect a plain encoding due to the file extension, no matter what your web server says.

In both cases, you will need a cgi that uncompresses content, the shell script below should do the trick with minimum fuss.
#!/bin/sh 
echo "Content-type: text/html 

" 
exec /usr/bin/zcat -f "${DOCUMENT_ROOT}${REDIRECT_URL}"

And here is a half cooked recipe to use with mod_rewrite that will work for calling your CGI on the fly if the client doesn't accept gzip encoding. The rule below will work if you have your compressed html files named file.html, instead of file.html.gz
RewriteEngine On 
RewriteOptions Inherit 

ReWriteCond %{HTTP:accept-encoding} !(gzip.*) 
ReWriteCond %{REQUEST_FILENAME} ^.+\.htm* 
ReWriteRule . /usr/www/users/MYNAME/cgi-bin/gunzip.cgi 
AddEncoding x-gzip htm html

2. keep 2 copies of the document, one compressed (file.html.gz) one umcompressed (file.html):

You will need for apache to determine which files to send to which user agents, this recipe in your .htaccess will work:
RewriteCond %{HTTP:accept-encoding} gzip RewriteCond %{REQUEST_FILENAME}.gz -f RewriteRule ^(.+)$ $1.gz [L]
Thats it, no CGI required. The only think you need now is to make a compressed copy of the files you want to serve compressed. A small perl script using File::Find and Compress::Zlib will probably do the job nicely, just remember to keep the compressed and uncompressed files identical.

3. keep pages uncompressed and compress them on the fly:

This is almost the reverse of option 1, except that since most browsers accept gzip encoding most of your http requests will carry the compressing overhead, which can be significant. You will need an Apache handler (see mod_gzip) or a CGI to do the dirty work of compressing the document and you will not have to worry about keeping two copies of the same document laying around.

Update
aceshardware.com on this topic:
http://www.aceshardware.com/read.jsp?id=45000244 (second half of page)

In reply to Lower hosting costs with compression by tstock

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.