Creating a CGI Caching Proxy

wsloand has asked for the wisdom of the Perl Monks concerning the following question:

So, I've got some web space that is not enough to hold my picture collection. As such, I'd like to have it become a cacheing proxy for my pictures which are housed at my home over a relatively slow dsl line. I've got some code that works, but if the file is too big, it causes a web timeout while it's going from my computer to the cache (roughly files that are over 1MB in size). I tried to make my system work with threading so that it could send the data as it was received, but I couldn't get that part to work. My code is below:

#!/usr/bin/perl
#!/usr/bin/perl -I/kunden/homepages/23/d94990689/htdocs/perl

$version = 0.1;

use CGI;
use CGI::Carp qw(fatalsToBrowser);
use Cache::SizeAwareFileCache;
use LWP::UserAgent;
use threads;
use Thread::Semaphore;
use Thread::Queue;

$cachesize = 1000000;
$cachedir  = '/usr/lib/cgi-bin/cache/';
@goodhosts = ('denney.homeip.net');

my $q = CGI->new;
my %parm = $q->Vars;

$req = HTTP::Request->new(HEAD => $parm{'url'});

unless (grep {$_ eq $req->uri->host()} @goodhosts) {
    print $q->header(-status => 406).
        $q->h1("Invalid Host Name").
            $q->p("You requested an invalid host");
    exit;
}

my $ua = LWP::UserAgent->new;
$ua->agent("Bill's CGICache $version");

$res = $ua->request($req);

# check the current file to see if we need to get a new version of it
if ($res->is_error()) {
    print $r->error_as_HTML();
    die;
}
else {
    # setup the cache
    my $cache = new
      Cache::SizeAwareFileCache({'namespace' => 'gallerycache',
                                 'default_expires_in' => 'never',
                                 'cache_root' => $cachedir});

    #grab the cache object based on URL
    my $file = $cache->get($res->base);

    my $hit = 1;
    if (! defined $file) {
        $reqget = HTTP::Request->new(GET => $parm{'url'});
        $queue = Thread::Queue->new;

        print $res->headers->as_string."\n";
        $thr = threads->new(\&writer, $queue);
        #$queue->enqueue($res->headers->as_string."\n");
        $resget = $ua->request($reqget, {$queue->enqueue(@_)});
        $queue->enqueue(undef);
        $thr->join;

        if ($res->is_error()) {
            print $r->error_as_HTML();
            exit;
        }

        $file = $resget; # get the file from your server
        $cache->set($res->base, $resget, $resget->freshness_lifetime);

        $hit = 0;
        $file->push_header(Bill_cache => 'Miss');
    }
    $file->init_header(Bill_cache => 'Hit');

    if ($hit) {
        print $file->headers->as_string."\n". $file->content;

    }

    # clean up the cache if we wrote to it.
    unless($hit) {
        limit_size($cachesize);
    }
}

sub writer {
    my($queue) = @_;

    while(my $mesg = $queue->dequeue) {
        print "test\n";
        print $mesg;
    }
}
[download]

Comment on Creating a CGI Caching Proxy Download Code

Replies are listed 'Best First'.
Re: Creating a CGI Caching Proxy by blokhead (Monsignor) on Jun 29, 2004 at 00:59 UTC
Looks like you're forgetting the sub keyword in this line: `$resget = $ua->request( $reqget, sub { $queue->enqueue(@_) } );` [download] As it's written now, you're just passing an anonymous hashref instead of an anonymous sub. blokhead (300th post)	[reply] [d/l]
Re: Creating a CGI Caching Proxy by acomjean (Sexton) on Jun 29, 2004 at 13:57 UTC
Not to give the question short shrift, but making a large file hop twice is going to take some time no matter what, especially since the image has to take 2 hops. (1 meg is a very large image for web use, consider compressing more/ smaller images). A more effective solution might be to link the image directly to your home server directly from your web site if its not on the main site (cross post). I think there are ways of doing this that make it hard to know the image url if that is what you desire.	[reply]
Re: Creating a CGI Caching Proxy by DrHyde (Prior) on Jun 29, 2004 at 15:27 UTC
Just use rsync to upload changes to the images directory of your off-site server whenever you add new ones. Works for me.	[reply]
Re: Creating a CGI Caching Proxy by sgifford (Prior) on Jun 30, 2004 at 21:04 UTC
I wrote an app similar (actually nearly identical) to this a while back. The approach I took was to have my script check if the item was in the cache, and if not `fork` off a process to download the file. Then the main script reads the cache file, and feeds it to the Web browser as it comes in. I used some kind of marker to tell if the file was fully or partially downloaded; I think I used the executable bit and file locks for this. This solves several problems. First, there's very little latency. Second, it solves the problem of what to do if two browsers try to access the file at once. Third, it solves the problem of what to do if the user presses stop. Here's some pseudocode, which is probably clearer: my $url = $cgi->param('url'); my $file = url2file($file); my $fh = FileHandle->new("< $file"); if ($fh) { # Executable means it's a partial download if (-x F) { # If it's not locked, the download process has died if (flock(F, LOCK_EX\|LOCK_NB)) { $fh=get($file,$url) or die "Couldn't get URL!\n"; } stream($fh); } stream($fh); } else { $fh=get($file,$url) or die "Couldn't get URL!\n"; stream($fh); } sub get { # Open the filehandle for read and write, # Lock filehandle for write, # fork off process to start the download. # Child Process: Download URL, # Set +x bit when done # Return dup of filehandle } sub stream { # Keep streaming data from filehandle until # the executable bit is set. Works pretty much like # tail -f. } [download]	[reply] [d/l] [select]