I help a friend run a web site that has a large number of files and takes up quite a bit of disk space. Sadly, I don't have infinite disk space and with the other things we want to do with the site, doubling the space usage isn't really an option.

So that said, I was looking into what mod_perl modules can help and Apache::DynaGzip or Apache::Compress seemed to fit the bill but only from a bandwidth viewpoint. My idea is the possibility of archiving all my HTML pages (ie creating index.html.gz) and then only decompressing the page should the client NOT support gzip encoded pages. The links in the pages would still be just 'index.html' etc.

So that leads me to this RFC. I think this has a practical application on my server as I can apply the same treatment to areas of high traffic.

My initial thoughts were to somehow extend Dynagzip or at least act as a proxy object to it. I can then choose to add this extra level of compression at the Apache config file level and so control what I do by location, file extension and directory as you would with other modules and directives.

This seems to be the way to go as reading its documentation it handles 99% of the things I would want. However, in all honesty this would be my first mod_perl module and so am a bit of a newbie in this area. I've got the Mod Perl cookbook and the Oreilly book so I am at least able to RTFM if booted in the right direction :). I have also read the content compression faq. As well as the other tutorials on perl.apache.org :)

So what do people think? Is this feasible, the right approach and does anyone have any suggestions/things for me to think about?

I'm running a Red Hat linux box with Apache 1.3.27, mod_perl 1.26 and Perl v5.6.1. I've already got Compress::Zlib v.1.14 and Apache::Compress v1.3 (they were pre-installed on this host). I don't, however, have Dynagzip installed though I am concerned by its low version number so would appreciate any comments on that too.

Replies are listed 'Best First'.
Re: RFC: Mod Perl compressed content
by diotalevi (Canon) on Nov 30, 2002 at 20:49 UTC

    This sounds a job for a larger hard disc. Seriously - while I can imagine some potential solutions it all sounds like a lot of work that doesn't actually have to happen. It isn't as if disc space is expensive these days. If I were handed this task I'd much prefer to just solve this the right way, save some time and go for a walk instead. Priorities.

    As for actual ideas ... you could do a mod_perl/cgi that does all your page serving for you. You'd then do the obvious thing - return compressed content for clients that support it and uncompressed for those that don't. It doesn't sound like you need rocket science or anything. Or a POE application that can keep some of the processed data cached.

    __SIG__ use B; printf "You are here %08x\n", unpack "L!", unpack "P4", pack "L!", B::svref_2object(sub{})->OUTSIDE;
      Disk space is a diminishing issue, but bandwidth is a growing one - why shovel more data across the network than necessary? For a heavily content oriented site as opposed to one where the bulk of bandwidth is consumed by binary downloads, the gains to be achieved by compressed delivery are impressive. Wouldn't you like being able to serve five times the number of monthly visitors with the same bandwidth bill?

      Makeshifts last the longest.

        Eh. Different problem. My initial reaction was to think that there are better ways to solve the problem. My second reaction is that actually this might be rather nice. I've used mod_gzip for just the sort of thing you mention but now that I'm thinking of it... it'd be nice to have the data be pre-compressed and instead of spending CPU time compressing data for clients that support it, spend CPU time decompressing data for clients that don't. So then perhaps it'd be really nice if there were a CGI::Gzip which would do the same thing as CGI except handle compressed content nicely. I'm not going to spend the time on it but if someone else were then maybe PerlIO::Gzip could just swap in the right filtering as needed.

        Update: I searched using the wrong keywords. Using Apache::Gzip you get some other finds like Apache::Compress, Apache::Dynagzip, IO::Filter::gzip.

        __SIG__ use B; printf "You are here %08x\n", unpack "L!", unpack "P4", pack "L!", B::svref_2object(sub{})->OUTSIDE;
      The point here is that the host costs me a certain amount per year. To upgrade the account would cost an additional £450 a year which is money I don't have. I have shopped around and these guys are very good and seeing as I have been with them for around 4 years I don't really want to move.

      So my priority is disk space and bandwidth.

      Thanks anyway.
      It's actually a very reasonable task. Inflation is much cheaper than deflation, so this should be less intensive than mod_gzip. And while YMMV, based upon the stats given by the maintainers of mod_gzip (and in my own experience) the vast majority of user-agents can handle zipped content-encoding just fine.

      UPDATE: This should be rather easy as a handler, assuming content is zipped check to see if Accept-Encoding includes zip (or whatever the actual token is). If not use Archive::Zip or some such to inflate over the pipe on the fly.

      --
      I'm not belgian but I play one on TV.

Re: RFC: Mod Perl compressed content
by BazB (Priest) on Nov 30, 2002 at 21:38 UTC

    mod_gzip (Sourceforge page here) should serve up gzip compressed HTML, compressing on the fly, or if foo.html.gz exists on the server, it will serve up that file.

    If mod_gzip isn't your thing, I guess implementing something similar to do it in Perl as you suggest shouldn't be too taxing.

    Cheers.

    BazB

      That doesn;t solve the matter of disk space, and keeping two copies of everything around certainly doesn't.

      --
      I'm not belgian but I play one on TV.