Petras has asked for the wisdom of the Perl Monks concerning the following question:

To those more Monkier than I,

Here's a self-imposed project that's already stretched me a lot, but has left me seeking more wisdom. I have a friend in CA who wants me to send her several hi-resoluction (600dpi) .jpg files. But she doesn't know what pictures I have available, and I don't know what she wants. So I was just going to make an HTML page of thumbnails (72dpi) that link to the hi-res images. This could be done in plain simple straight-forward HTML. But instead I smelled a learning opportunity ;)

I want to make a simple HTML page of the low-res thumbnails, but each thumbnail will have it's own checkbox. The user can scroll down the page, select the images she wants to have high-res copies of, then hit submit. The CGI script will add the hi-res files to a .zip file and generate a new HTML doc with directory list of the new .zip and a link to download the file. Looking at CPAN I found Archive:Zip and came up with this code:
#! /usr/bin/perl -wT # PSUEDO CODE # 1. Plain HTML page with check-boxed thumbnail images # 2. On submit() hi-res copies of the selected images are # added to a .zip file # 3. A new page is displayed listing the content and size # of the file with a link to the newly created file # 4. Once the file is downloaded it will be erased from the # server. (better to prompt the user to erase?) use strict; use CGI; use Archive:Zip; # Archive:Zip POD is at # http://theoryx5.uwinnipeg.ca/CPAN/data/Archive-Zip/Archive/Zip.html # Make a new CGI object my $q = CGI->new(); # Create an array of the file names (drawn from the name # tag in the HTML) to be added to the zip file my @selectedFiles = $q->param; # How to name the .zip file without risk of duplication? my $fileName= ####; my $zip=Archive:Zip->new($fileName); # Add the files to the zip file foreach my $key(@selectedFiles) $zip->addfile ($key); # Draw a new HTML page listing the .zip file contents plus # a link to download the file print $q->header( "text/html" ), $q->start_html(-title => "Directory of $fileName.zip", -bgcolor => "#ffffff" ), $q->h1( "Contained in $zipFileName" ), $q->start_ul; # for each member-file in the .zip file, print the file # as a list item foreach $key($zip->members()) print $q->li("$key"); print $q->end_ul, $q->p( "File Size: stat($fileName.zip)[7]" ), $q->a( "Download $fileName.zip", $fileName+".zip" ), $q->end_html;

I see a few new questions popping up as I look at the code I've created:
  1. How do I create a new/unique file name for each instance the script is ran (in reality probably only one person would use this script once, but I'm trying to learn here!)? How do I do it safely?
  2. How can I have the server erase the file once it is downloaded?
  3. How can I tell the script to erase the file if it isn't downloaded within, say, ten minutes (to keep some cracker from filling the server with huge .zip files just for kicks)?
Yes, I could do the whole project without Perl, but what a wasted opportunity! Any ideas on how to deal with these issues? Or any ideas for places to look? Thanks for your experience!

Cheers!
Petras
Don't worry about people stealing your ideas. If your ideas are any good, you'll have to ram them down people's throats.

-Howard Aiken

Replies are listed 'Best First'.
Re: Non-Duplicate File Names, Security, and Self Cleaning
by nite_man (Deacon) on Jun 06, 2003 at 06:32 UTC
    How do I create a new/unique file name for each instance the script is ran (in reality probably only one person would use this script once, but I'm trying to learn here!)? How do I do it safely?

    Try to use File::Temp for creation of a temporary file safely.

    How can I tell the script to erase the file if it isn't downloaded within, say, ten minutes (to keep some cracker from filling the server with huge .zip files just for kicks)?

    You can add in your download script a something like this:

    sub erase_files { my $period = 10; opendir(DIR, '/your/tmp/dir') or die $!; my @files = grep {!(/^\./) && /\.zip$/} readdir(DIR); close DIR; my $now = localtime; for my $file (@files) { my $ftime = (stat($file))[9]; if(($now - $ftime) > $period*3600) { unlink $file } } }

          
    --------------------------------
    SV* sv_bless(SV* sv, HV* stash);
    
Re: Non-Duplicate File Names, Security, and Self Cleaning
by chromatic (Archbishop) on Jun 06, 2003 at 06:38 UTC

    File::Temp is a wonderfully useful standard module that could come in handy. If you used the tmpnam() function, you could get a temporary name. If you return the zip file to the browser, you could have it downloaded immediately, then immediately unlink the file.

    If the file list preview is important, you could have a two stage program — or two separate scripts.

      File::Temp worked out well, and was surprisingly easy to learn. Actually the whole script turned out to be surprisingly short and simple to write. I don't know why you ever let me waste time with that other language at all!

      As short as it is, this is my most complex Perl project yet and there's one part I just can't find an easy solution for. Here's the html that calls the cgi:
      <html> <body> <form action='../cgi-bin/maketar.cgi' method='post'> <input type=checkbox name="1.text">1<br> <input type=checkbox name="2.text">2<br> <input type=checkbox name="3.text">3<br> <input type=checkbox name="4.text">4<br> <input type=checkbox name="5.text">5<br> <input type=submit> </form> </body> </html>
      And here is the actual script. It's shorter and pretier, but I can't get it to force a download of the new file (oh, I should mention that on advice of fglock I went to Archive::Tar instead of Archive::Zip):
      #! /usr/bin/perl/bin/perl -wT use CGI; use strict; use File::Temp; use Archive::Tar; # Create CGI object and make array or params # The name will come from the HTML checkbox tags my $q = CGI->new(); my @selectedFiles = $q->param; # Create random $filename and .tar file with that name (my $fh, my $filename) = File::Temp::tempfile( "img_XXXX", SUFFIX=>'.t +ar' ); my $tar = Archive::Tar -> new ( $filename ); # Add the files to the .tar file and write it to disk $tar->add_files ( @selectedFiles ); $tar->write ( $filename, 0 ); ###### Here's the Trouble ###### # Send the .tar file to the browser print $q->redirect( $filename ); ################################ # Erase the .tar file from the server File::Temp::unlink0($fh, $filename) or die "Error unlinking $filename +safely";
      CGI::Push isn't going to do it, I don't think. What happens with the code as is, using Indigo Perl's Apache bundle locally on my machine the script creates a .tar file, tries to display the contents, can't and the script runs again, creating an infinite (until I intervene) number of .tar files in the cgi-bin directory. On my webserver it makes one .tar file, then shuts down. RTFMing is proving quite educational, but I'm not seeing a way to open a download file box. I could do something like:
      print $q -> start_html, $q -> a ( "$filename", $filename ), $q -> end_html;
      but I'd like to avoid that if possible. Any way I could?

      Cheers,
      -P
      Don't worry about people stealing your ideas. If your ideas are any good, you'll have to ram them down people's throats.

      -Howard Aiken

        The trick is sending the file in the same process, before you unlink it.

        use File::Copy; # generate tar file, path is in $filename print $q->header( 'archive/tar' ); copy( $filename, \*STDOUT ); END { unlink $filename; }
Re: Non-Duplicate File Names, Security, and Self Cleaning
by benn (Vicar) on Jun 06, 2003 at 09:03 UTC
    From an 'interface' point of view, it makes sense to do this all in one step and (as chromatic and Skeeve suggest) avoid using a temporary file. Having a second 'confirm' screen is not only complicating your code, but making it more troublesome for the user.

    It's unlikely that the user will remember specific filenames, so unless your second "directory list" screen also includes thumbnails ("are you sure you wanted these pictures?") , you may as well just use the one main screen full of checkboxes, and the 'Download' button causes the zip file to be written directly to the browser (using Archive::Zip's writeToFileHandle() method I should think - haven't used it myself, but it looks that way from the docs).

    Cheers,Ben.

      I've switched from Archive::Zip to Archive::Tar, so there's not a writeToFileHandle() method available. Where I'm running into trouble now is getting the file to the browser. CGI::Push isn't what I want to do, and
      # Send the .tar file to the browser print $q->redirect( $filename );
      can't do it. I could do
      print $q -> start_html, $q -> a ( "$filename", $filename ), $q -> end_html;
      but if I could avoid the whole new page thing I'd like to. Any ideas?
      Cheers!
      -P
      Don't worry about people stealing your ideas. If your ideas are any good, you'll have to ram them down people's throats.

      -Howard Aiken
Re: Non-Duplicate File Names, Security, and Self Cleaning
by Skeeve (Parson) on Jun 06, 2003 at 06:20 UTC
    1. I would use $$ somewhere in the filename.
      BTW: Do users log in? You could give each of them her own subdir so no one will see or destroy other's ZIPs. Given that you can easily use the same file name over and over again, just erasing the old one or even adding pictures to it.
    2. If you use a cgi for download add an unlink $filname.
      I wouldn't do it. I would give the user the oportunity to delete it at will. At least if you go the delete-after-10-minutes-way.
    3. After creating the ZIP you could fork another process which does a sleep 10*60 at first and then unlinks the file.
      Users don't log in. Really only one person is wanting some img files. Just my brain started spinning and thought I could do it this way to learn something in the process. If I wind up making something usefull that I or other people would want to use again later, great! I was wanting to make unique names, though, just because that would mean learning how to do something, and because someday if the code is good it might be used in a more trafficed scenario.

      I thought letting users delete the file themselves could make for some big security holes, and there are also lazy users out there. I just didn't know that I could make the script delete the file, but it seemed like something I should be able to do (yep, I'm still quite the newbie!)

      Thanks,
      Petras
      Don't worry about people stealing your ideas. If your ideas are any good, you'll have to ram them down people's throats.

      -Howard Aiken
        > Users don't log in. [...] I could do it this way to learn
        So this is another option for you to learn. ;-)

        > I thought letting users delete the file themselves could make for some big security holes
        It depends on how you do it. Each user should be able to delete just her own files. So if you make up some rally good random name this shouldn't be a problem. Just remember that your script should not allow any character in filenames to delete other than those characters you use in your generated filenames. Espacially no "/"!

        > and there are also lazy users out there
        That's why I said you should do it if you take the 10-minute-than-delete-way. So user can be gentle and delete the file after they used it.
        While I write it: Why don't you simply store the indices of the selected pictures in a cookie? If you don't have to many pictures this shouldn't be a problem. You could even save some space if you use a vec-tor to store which pics are choosen. When the user clicks on "download" the ZIP will be generated "on the fly" and will never appear in any direcory on the server.
        Just some more opportunities for you to learn from ;-)

      Im not sure if $$ is either new or unique...

      but localtime() is.

        $$ is uniqe as long as the system is not rebooted and as long as there is no overflow in the machines process counter. It might happen, but it's not very probable. I also didn't say that it's guaranteed to give you a unique filename. I just said, I would use it in the name. Not that I would use it as a name.

        But regarding localtime: it's almost sure that you will get the same result quite often. At least there is a delay of 1 second you will have to have before the script might be started again.

Re: Non-Duplicate File Names, Security, and Self Cleaning
by fglock (Vicar) on Jun 06, 2003 at 14:58 UTC

    Unless for learning purposes, don't mind zipping jpeg files. The gain is very low (no more than 2%).

    If you just want to join files together, "tar" takes much less cpu time.

      Thanks. I didn't even think of using another format. Using Archive::Zip was definitely educational but installing the required Compress::Glink update has been annoying anyway. Archive::Tar is already installed, so why not?!? Thanks for the tip. If you want to see what I've come up with it's here.
      Cheers!
      -P
      Don't worry about people stealing your ideas. If your ideas are any good, you'll have to ram them down people's throats.

      -Howard Aiken