Bandwidth and disk space are normally metered and charged for, and CPU usage is normally not. We can "exploit" this to lower hosting costs.
Warning: In some cases, hosting companies will limit CPU cycles or flat out ask you to leave if you go above their usage
expectations. In shared hosting cases, bear in mind that while there is only a small chance you can affect another user on the server
by using too much disk space or bandwidth, increased CPU usage will slow down the server for everyone. Be a good neighbor and client,
keep an eye on server load while attempting to trade your byte overage for cpu cycles.
Code on this page is meant as examples only, but if you detect a "bug" please let me know.
Compression tests
You probably know that browsers determine what type of document they are receiving by the Content-Type header, but the header that
interests us here is the Content-Encoding one. Most browsers accept gzip encoding, some accept compress and deflate as well.
With the help of Compress::Zlib we can make a few tests:
Lab test 1.1 - run from browser, test for gzip encoding compatibility
#! /usr/bin/perl -w
use strict;
use Compress::Zlib;
print "Content-Type: text/plain\n";
$ENV{HTTP_ACCEPT_ENCODING} ||= '';
if ( $ENV{HTTP_ACCEPT_ENCODING} =~ /gz/ )
{
print "Content-encoding: gzip\n\n";
my $gz = gzopen ( \*STDOUT, "wb" );
$gz->gzwrite ( "I'm compressed" );
}
else
{
print "\n";
print "I'm not compressed\n";
}
__END__
Nice, but is it worth it?
Lab test 1.2 - run from shell, test savings
#! /usr/bin/perl
use LWP::Simple;
use Compress::Zlib;
$a = get( 'http://www.perlmonks.com' );
$b = Compress::Zlib::memGzip( $a );
print "uncompressed: ", length( $a ), "\n";
print "compressed: ", length ($b), "\n";
__END__
Change the URL and check for other sites/pages. It is REALLY worth it if you are being charged for bandwidth, but it is also worth it
in the sense that your visitors get the page faster and your httpd processes (threads in NT) are tied up for less time.
Note that images are normally already compressed (PNG, JPEG, GIF), and the overhead of (un)compressing them is not worth the savings.
For static pages, you have 3 options to serve compressed HTML documents.
1.
keep pages compressed and uncompress them on the fly if browser does not accept gzip encoding:
This will not only save bandwidth but also disk space.
The question you need to ask yourself if you go this route is "should links on pages be page.html.gz or page.html?".
Although .html sounds better even if the content is encoded, some browsers (IE: older IE versions) might expect a plain
encoding due to the file extension, no matter what your web server says.
In both cases, you will need a cgi that uncompresses content, the shell script below should do the trick with minimum fuss.
#!/bin/sh
echo "Content-type: text/html
"
exec /usr/bin/zcat -f "${DOCUMENT_ROOT}${REDIRECT_URL}"
And here is a half cooked recipe to use with mod_rewrite that will work for calling your CGI on the fly if the
client doesn't accept gzip encoding. The rule below will work if you have your compressed html
files named file.html, instead of file.html.gz
RewriteEngine On
RewriteOptions Inherit
ReWriteCond %{HTTP:accept-encoding} !(gzip.*)
ReWriteCond %{REQUEST_FILENAME} ^.+\.htm*
ReWriteRule . /usr/www/users/MYNAME/cgi-bin/gunzip.cgi
AddEncoding x-gzip htm html
2.
keep 2 copies of the document, one compressed (file.html.gz) one umcompressed (file.html):
You will need for apache to determine which files to send to which user agents, this recipe in your .htaccess will work:
RewriteCond %{HTTP:accept-encoding} gzip
RewriteCond %{REQUEST_FILENAME}.gz -f
RewriteRule ^(.+)$ $1.gz [L]
Thats it, no CGI required. The only think you need now is to make a compressed copy of the files you want to serve compressed.
A small perl script using File::Find and Compress::Zlib will probably do the job nicely, just remember to keep the
compressed and uncompressed files identical.
3.
keep pages uncompressed and compress them on the fly:
This is almost the reverse of option 1, except that since most browsers accept gzip encoding
most of your http requests will carry the compressing overhead, which can be significant.
You will need an Apache handler
(see mod_gzip) or a CGI to do the dirty work of compressing the document and you will not
have to worry about keeping two copies of the same document laying around.
Update
aceshardware.com on this topic:
http://www.aceshardware.com/read.jsp?id=45000244 (second half of page)