Non-Duplicate File Names, Security, and Self Cleaning

Petras has asked for the wisdom of the Perl Monks concerning the following question:

To those more Monkier than I,

Here's a self-imposed project that's already stretched me a lot, but has left me seeking more wisdom. I have a friend in CA who wants me to send her several hi-resoluction (600dpi) .jpg files. But she doesn't know what pictures I have available, and I don't know what she wants. So I was just going to make an HTML page of thumbnails (72dpi) that link to the hi-res images. This could be done in plain simple straight-forward HTML. But instead I smelled a learning opportunity ;)

I want to make a simple HTML page of the low-res thumbnails, but each thumbnail will have it's own checkbox. The user can scroll down the page, select the images she wants to have high-res copies of, then hit submit. The CGI script will add the hi-res files to a .zip file and generate a new HTML doc with directory list of the new .zip and a link to download the file. Looking at CPAN I found Archive:Zip and came up with this code:

#! /usr/bin/perl -wT

# PSUEDO CODE
# 1. Plain HTML page with check-boxed thumbnail images
# 2. On submit() hi-res copies of the selected images are 
#    added to a .zip file
# 3. A new page is displayed listing the content and size 
#    of the file with a link to the newly created file
# 4. Once the file is downloaded it will be erased from the 
#    server. (better to prompt the user to erase?)

use strict;
use CGI;
use Archive:Zip;
# Archive:Zip POD is at 
# http://theoryx5.uwinnipeg.ca/CPAN/data/Archive-Zip/Archive/Zip.html

# Make a new CGI object
my $q = CGI->new();

# Create an array of the file names (drawn from the name 
# tag in the HTML) to  be added to the zip file
my @selectedFiles = $q->param;

# How to name the .zip file without risk of duplication?
my $fileName= ####;
my $zip=Archive:Zip->new($fileName);

# Add the files to the zip file
foreach my $key(@selectedFiles)
  $zip->addfile ($key);

# Draw a new HTML page listing the .zip file contents plus 
# a link to download the file
print $q->header( "text/html" ),
      $q->start_html(-title   => "Directory of $fileName.zip",
                         -bgcolor => "#ffffff" ),
      $q->h1( "Contained in $zipFileName" ),
      $q->start_ul;

# for each member-file in the .zip file, print the file
# as a list item
foreach $key($zip->members())
  print $q->li("$key");

print $q->end_ul,
      $q->p( "File Size: stat($fileName.zip)[7]" ),
      $q->a( "Download $fileName.zip", $fileName+".zip" ),
      $q->end_html;
[download]

I see a few new questions popping up as I look at the code I've created:

How do I create a new/unique file name for each instance the script is ran (in reality probably only one person would use this script once, but I'm trying to learn here!)? How do I do it safely?
How can I have the server erase the file once it is downloaded?
How can I tell the script to erase the file if it isn't downloaded within, say, ten minutes (to keep some cracker from filling the server with huge .zip files just for kicks)?

Yes, I could do the whole project without Perl, but what a wasted opportunity! Any ideas on how to deal with these issues? Or any ideas for places to look? Thanks for your experience!

Cheers!
Petras

Don't worry about people stealing your ideas. If your ideas are any good, you'll have to ram them down people's throats.

-Howard Aiken

Comment on Non-Duplicate File Names, Security, and Self Cleaning Download Code

Replies are listed 'Best First'.
Re: Non-Duplicate File Names, Security, and Self Cleaning by nite_man (Deacon) on Jun 06, 2003 at 06:32 UTC
How do I create a new/unique file name for each instance the script is ran (in reality probably only one person would use this script once, but I'm trying to learn here!)? How do I do it safely? Try to use File::Temp for creation of a temporary file safely. How can I tell the script to erase the file if it isn't downloaded within, say, ten minutes (to keep some cracker from filling the server with huge .zip files just for kicks)? You can add in your download script a something like this: `sub erase_files { my $period = 10; opendir(DIR, '/your/tmp/dir') or die $!; my @files = grep {!(/^\./) && /\.zip$/} readdir(DIR); close DIR; my $now = localtime; for my $file (@files) { my $ftime = (stat($file))[9]; if(($now - $ftime) > $period3600) { unlink $file } } }` [download] -------------------------------- SV sv_bless(SV* sv, HV* stash);	[reply] [d/l]
Re: Non-Duplicate File Names, Security, and Self Cleaning by chromatic (Archbishop) on Jun 06, 2003 at 06:38 UTC
File::Temp is a wonderfully useful standard module that could come in handy. If you used the `tmpnam()` function, you could get a temporary name. If you return the zip file to the browser, you could have it downloaded immediately, then immediately unlink the file. If the file list preview is important, you could have a two stage program — or two separate scripts.	[reply] [d/l]
Re: Re: Non-Duplicate File Names, Security, and Self Cleaning by Petras (Friar) on Jun 12, 2003 at 01:06 UTC
File::Temp worked out well, and was surprisingly easy to learn. Actually the whole script turned out to be surprisingly short and simple to write. I don't know why you ever let me waste time with that other language at all! As short as it is, this is my most complex Perl project yet and there's one part I just can't find an easy solution for. Here's the html that calls the cgi: `<html> <body> <form action='../cgi-bin/maketar.cgi' method='post'> <input type=checkbox name="1.text">1<br> <input type=checkbox name="2.text">2<br> <input type=checkbox name="3.text">3<br> <input type=checkbox name="4.text">4<br> <input type=checkbox name="5.text">5<br> <input type=submit> </form> </body> </html>` [download] And here is the actual script. It's shorter and pretier, but I can't get it to force a download of the new file (oh, I should mention that on advice of fglock I went to Archive::Tar instead of Archive::Zip): #! /usr/bin/perl/bin/perl -wT use CGI; use strict; use File::Temp; use Archive::Tar; # Create CGI object and make array or params # The name will come from the HTML checkbox tags my $q = CGI->new(); my @selectedFiles = $q->param; # Create random $filename and .tar file with that name (my $fh, my $filename) = File::Temp::tempfile( "img_XXXX", SUFFIX=>'.t +ar' ); my $tar = Archive::Tar -> new ( $filename ); # Add the files to the .tar file and write it to disk $tar->add_files ( @selectedFiles ); $tar->write ( $filename, 0 ); ###### Here's the Trouble ###### # Send the .tar file to the browser print $q->redirect( $filename ); ################################ # Erase the .tar file from the server File::Temp::unlink0($fh, $filename) or die "Error unlinking $filename +safely"; [download] CGI::Push isn't going to do it, I don't think. What happens with the code as is, using Indigo Perl's Apache bundle locally on my machine the script creates a .tar file, tries to display the contents, can't and the script runs again, creating an infinite (until I intervene) number of .tar files in the cgi-bin directory. On my webserver it makes one .tar file, then shuts down. RTFMing is proving quite educational, but I'm not seeing a way to open a download file box. I could do something like: `print $q -> start_html, $q -> a ( "$filename", $filename ), $q -> end_html;` [download] but I'd like to avoid that if possible. Any way I could? Cheers, -P Don't worry about people stealing your ideas. If your ideas are any good, you'll have to ram them down people's throats. -Howard Aiken	[reply] [d/l] [select]
Re: Re: Re: Non-Duplicate File Names, Security, and Self Cleaning by chromatic (Archbishop) on Jun 12, 2003 at 07:07 UTC
The trick is sending the file in the same process, before you unlink it. `use File::Copy; # generate tar file, path is in $filename print $q->header( 'archive/tar' ); copy( $filename, \*STDOUT ); END { unlink $filename; }` [download]	[reply] [d/l]
Re: Non-Duplicate File Names, Security, and Self Cleaning by benn (Vicar) on Jun 06, 2003 at 09:03 UTC
From an 'interface' point of view, it makes sense to do this all in one step and (as chromatic and Skeeve suggest) avoid using a temporary file. Having a second 'confirm' screen is not only complicating your code, but making it more troublesome for the user. It's unlikely that the user will remember specific filenames, so unless your second "directory list" screen also includes thumbnails ("are you sure you wanted these pictures?") , you may as well just use the one main screen full of checkboxes, and the 'Download' button causes the zip file to be written directly to the browser (using Archive::Zip's writeToFileHandle() method I should think - haven't used it myself, but it looks that way from the docs). Cheers,Ben.	[reply]
Re: Re: Non-Duplicate File Names, Security, and Self Cleaning by Petras (Friar) on Jun 12, 2003 at 01:18 UTC
I've switched from Archive::Zip to Archive::Tar, so there's not a writeToFileHandle() method available. Where I'm running into trouble now is getting the file to the browser. CGI::Push isn't what I want to do, and `# Send the .tar file to the browser print $q->redirect( $filename );` [download] can't do it. I could do `print $q -> start_html, $q -> a ( "$filename", $filename ), $q -> end_html;` [download] but if I could avoid the whole new page thing I'd like to. Any ideas? Cheers! -P Don't worry about people stealing your ideas. If your ideas are any good, you'll have to ram them down people's throats. -Howard Aiken	[reply] [d/l] [select]
Re: Re: Re: Non-Duplicate File Names, Security, and Self Cleaning by benn (Vicar) on Jun 13, 2003 at 16:00 UTC
Have a look at Sending non-html files to a browser through a script and its answers - all the stuff you need concerning Content-type headers etc. is there. Cheers, Ben.	[reply]
Re^4: Non-Duplicate File Names, Security, and Self Cleaning by Petras (Friar) on Jun 28, 2003 at 00:54 UTC
Re: Re^4: Non-Duplicate File Names, Security, and Self Cleaning by fglock (Vicar) on Jun 28, 2003 at 01:12 UTC
Some notes below your chosen depth have not been shown here
Re: Non-Duplicate File Names, Security, and Self Cleaning by Skeeve (Parson) on Jun 06, 2003 at 06:20 UTC
I would use $$ somewhere in the filename. BTW: Do users log in? You could give each of them her own subdir so no one will see or destroy other's ZIPs. Given that you can easily use the same file name over and over again, just erasing the old one or even adding pictures to it. If you use a cgi for download add an `unlink $filname`. I wouldn't do it. I would give the user the oportunity to delete it at will. At least if you go the delete-after-10-minutes-way. After creating the ZIP you could fork another process which does a `sleep 10*60` at first and then unlinks the file.	[reply] [d/l] [select]
Re: Re: Non-Duplicate File Names, Security, and Self Cleaning by Petras (Friar) on Jun 06, 2003 at 06:29 UTC
Users don't log in. Really only one person is wanting some img files. Just my brain started spinning and thought I could do it this way to learn something in the process. If I wind up making something usefull that I or other people would want to use again later, great! I was wanting to make unique names, though, just because that would mean learning how to do something, and because someday if the code is good it might be used in a more trafficed scenario. I thought letting users delete the file themselves could make for some big security holes, and there are also lazy users out there. I just didn't know that I could make the script delete the file, but it seemed like something I should be able to do (yep, I'm still quite the newbie!) Thanks, Petras Don't worry about people stealing your ideas. If your ideas are any good, you'll have to ram them down people's throats. -Howard Aiken	[reply]
Re: Re: Re: Non-Duplicate File Names, Security, and Self Cleaning by Skeeve (Parson) on Jun 06, 2003 at 07:28 UTC
> Users don't log in. [...] I could do it this way to learn So this is another option for you to learn. ;-) > I thought letting users delete the file themselves could make for some big security holes It depends on how you do it. Each user should be able to delete just her own files. So if you make up some rally good random name this shouldn't be a problem. Just remember that your script should not allow any character in filenames to delete other than those characters you use in your generated filenames. Espacially no "/"! > and there are also lazy users out there That's why I said you should do it if you take the 10-minute-than-delete-way. So user can be gentle and delete the file after they used it. While I write it: Why don't you simply store the indices of the selected pictures in a cookie? If you don't have to many pictures this shouldn't be a problem. You could even save some space if you use a vec-tor to store which pics are choosen. When the user clicks on "download" the ZIP will be generated "on the fly" and will never appear in any direcory on the server. Just some more opportunities for you to learn from ;-)	[reply]
Re^4: Non-Duplicate File Names, Security, and Self Cleaning by Petras (Friar) on Jun 12, 2003 at 01:11 UTC
Re: Re: Non-Duplicate File Names, Security, and Self Cleaning by bm (Hermit) on Jun 06, 2003 at 12:04 UTC
Im not sure if `$$` is either new or unique... but `localtime()` is.	[reply] [d/l] [select]
Re: Re: Re: Non-Duplicate File Names, Security, and Self Cleaning by Skeeve (Parson) on Jun 06, 2003 at 12:33 UTC
$$ is uniqe as long as the system is not rebooted and as long as there is no overflow in the machines process counter. It might happen, but it's not very probable. I also didn't say that it's guaranteed to give you a unique filename. I just said, I would use it in the name. Not that I would use it as a name. But regarding localtime: it's almost sure that you will get the same result quite often. At least there is a delay of 1 second you will have to have before the script might be started again.	[reply]
Re: Re: Re: Re: Non-Duplicate File Names, Security, and Self Cleaning by bm (Hermit) on Jun 06, 2003 at 14:33 UTC
Re: Re: Re: Re: Re: Non-Duplicate File Names, Security, and Self Cleaning by Skeeve (Parson) on Jun 08, 2003 at 11:56 UTC
Re: Non-Duplicate File Names, Security, and Self Cleaning by fglock (Vicar) on Jun 06, 2003 at 14:58 UTC
Unless for learning purposes, don't mind zipping jpeg files. The gain is very low (no more than 2%). If you just want to join files together, "tar" takes much less cpu time.	[reply]
Re: Re: Non-Duplicate File Names, Security, and Self Cleaning by Petras (Friar) on Jun 12, 2003 at 01:13 UTC
Thanks. I didn't even think of using another format. Using Archive::Zip was definitely educational but installing the required Compress::Glink update has been annoying anyway. Archive::Tar is already installed, so why not?!? Thanks for the tip. If you want to see what I've come up with it's here. Cheers! -P Don't worry about people stealing your ideas. If your ideas are any good, you'll have to ram them down people's throats. -Howard Aiken	[reply]