Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

Serving tarball contents as part of your webspace (impractical but fun)

by Aristotle (Chancellor)
on Dec 28, 2001 at 20:50 UTC ( #134907=CUFP: print w/replies, xml ) Need Help??

Someone else has probably thought of this before me, but I'm posting this since I dreamt it up and produced working code in about 5 minutes. I was yet again amazed at Perl and CPAN!

#!/usr/bin/perl -w use strict; use CGI; use CGI::Carp qw(fatalsToBrowser); use Archive::Tar; my $cgi = CGI->new(); my $tar = Archive::Tar->new('foobar.tar.gz'); my $file = $cgi->path_info(); $file =~ s|^/||; print $cgi->header(), $tar->get_content($file);

Say you have called this script foobar.cgi, and have a tarball called foobar.tar.gz - then you'll get the archived file foo/bar.html out of the archive if you request http://servername/foobar.cgi/foo/bar.html. In fact if your webserver is properly configured you could call it just, say, documents, and the URL would look like http://servername/documents/foo/bar.html, giving your visitors not the least hint that anything unusual is happening, besides the rather long load time.

There's some quick&dirty bits there -

  1. I didn't feel like handling MIME types correctly but if you actually wanted to use this for real, you'd probably want to read the server's mime.types.
  2. I didn't give any extra though on rewriting the path_info(), just unconditionally slice any preceeding slash off of it. This may or may not produce unintended results.
  3. You'll probably want to trap failure to find $file and try $file."index.html" and/or $file."/index.html" in that case. Substitue index.html for as many variations as you like.

Unfortunately, it easily spikes your CPU load to 100% and drains lots of memory for long stretches if a HTML page refers to images stored within the tarball causing multiple concurrent CGIs ungzipping/untarring the archive. I tried performance with Archive::Zip, but didn't get any better results, unfortunately.

So there. Utterly useless for any practical purposes but just dead cool. :-) What do you think?

  • Comment on Serving tarball contents as part of your webspace (impractical but fun)
  • Download Code

Replies are listed 'Best First'.
Re (tilly) 1: Serving tarball contents as part of your webspace (impractical but fun)
by tilly (Archbishop) on Jan 05, 2002 at 19:26 UTC
    One of your main problems is that the gzip format means that you have to decompress the whole tar file in order to get any particular one out of it.

    I would expect far better performance if you gzipped the individual files and then tarred the bundle rather than doing it the other way. This isn't usually done because it results in less overall compression. But it means that any particular file can be extracted relatively easily.

    Also note that you could improve performance even more by just losing the idea of having a tar. But if you do that then you need to be very, very careful about security else someone will be able to access any gzipped file on your webserver.

      Thatís why I tried Archive::Zip - zipfiles contain individually compressed files. However, as I said, performance improved only marginally at best. The problem that you get a multitude of simultaneously running scripts all doing the same, very CPU intensive thing remains, after all. Losing the tarball is not helpful since the main idea was to keep a document tree that consists of oodles of tiny snippet files from eating an ungodly amount of inodes.

      Youíre giving me an idea though Ė Iíll check to see how it performs with an uncompressed zipfile. I know uncompressed tarballs wonít make a huge difference since Archive::Tar always slurps the whole tarball into memory no matter what. However, zipfiles are indexed, and maybe Archive::Zip is smart enough to exploit that in which case this thing may actually be useful.

      Iíll update as soon as Iíve found the time to run a quick check.

Re: Serving tarball contents as part of your webspace (impractical but fun)
by merlyn (Sage) on Jan 11, 2002 at 08:52 UTC
Re: Serving tarball contents as part of your webspace (impractical but fun)
by pokemonk (Scribe) on Jan 03, 2002 at 22:34 UTC
    I do like your code, and yes it is useful for me, i don't have much space, this kinda works. I also have people upload stuff to be downloaded later. The upload a bunch of compressed files, so this might be a cool feature! Good Job.


Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://134907]
Approved by root
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (4)
As of 2023-09-24 08:21 GMT
Find Nodes?
    Voting Booth?

    No recent polls found