in reply to Re: Wrapper to Gzip CGI output
in thread Wrapper to Gzip CGI output

It's a compiled ansi C cgi using our own library, so there are no tools out there that I can use, it would require a lot of re-writing, as you have to know the content length of the document. We put out about 1.5TB of output a month, so if I can 'wrapper' that so the 'wrapper' receives the data from the cgi and then compresses it and sends it to the visitor, that would be my ideal solution. And effectively that should save us about 50% or more in bandwidth usage.

Replies are listed 'Best First'.
Re^3: Wrapper to Gzip CGI output
by afoken (Chancellor) on Sep 07, 2014 at 08:44 UTC

    So, you need a wrapper. Do you know the CGI protocol? If not, read it now (RFC3875). Also read http://en.wikipedia.org/wiki/HTTP_compression. From there, it is quite obvious how to implement such a wrapper:

    The wrapper needs to pass the entire environment and standard input unmodified to the original CGI. It has to write all CGI response headers unmodified to standard output, plus it has to write a Content-Encoding: gzip header. Then, it has to read all of the CGI output, compress it, and write the compressed data to standard output.

    There is only one trap left. Not all HTTP clients accept gzip compressed data. Luckily, this special case is simple to detect, and simple to handle: Clients that can handle gzip compressed data send an Accept-Encoding header that contains the word gzip. The webserver places the value of that header in the environment variable HTTP_ACCEPT_ENCODING. If that variable does not exist at all or does not contain gzip, simply replace the wrapper with the actual CGI (exec("/path/to/real.cgi")). Do this test as early as possible.

    Back to the wrapper implementation (after the Accept-Encoding test): Unless you actively prevent it, standard input is automatically inherited to child processes, and so is the environment. So you don't have to take care of these. Standard error is also inherited, this is a good thing, too. All that's left to do is to fork() a child process, exec("/path/to/real.cgi") there, and read in whatever the child process writes to its standard output. (You may need pipe in C.) Read line by line, write to standard output whatever you read from the child, until you read an empty line. Don't write out that empty line, instead write the Content-Encoding header line plus an empty line to standard output. After that, read large chunks (not lines!) from the child process with the real CGI, compress them, and write the compressed data to standard output.

    Add some error handling. Write a minimalistic 500 Internal Server Error page whenever something went wrong, and write details to standard error so you can see actual error messages in the webserver's error log.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)