in reply to Re: File Upload Security Question
in thread File Upload Security Question

I'm of a similiar opinion. But going one step better, don't use files at all. Use a database. Then you can attach all sorts of extra information that a file has no facility for (comments, user, number of downloads, yada yada yada). And DBs typically give you better flexibility if you have to start splitting machines to store data off from the webservers, etc.

DBs also support atomic operations more readily. How are you going to handle two users uploading the same filename at the same time, assuming you don't support a mechanism like Russ proposes? The window for error gets wider if users are running on slow links (What?!? Everyone doesn't have a personal OC-12?)

--Chris
  • Comment on (jcwren) RE: Re: File Upload Security Question

Replies are listed 'Best First'.
RE: RE: Re: File Upload Security Question
by Ovid (Cardinal) on Jun 12, 2000 at 05:25 UTC
    Okay, I have a slight confession to make. I was going to store the files in a database but I didn't know how to serve them back out to the users (hanging my head in shame). Since I have a very tight deadline (they start using my work the day after I am writing this), I just quickly went with what I knew to get it up and running.

    I'll be revisiting this after a week or so to clean things up (they'll use it for a week and then stop), so if anyone can point me in the right direction for serving files directly out of MySQL using CGI.pm and DBI.pm, that would be great.

    Also, I'm just storing the files directly, I do have a table in my database which stores the other information you mentioned. Wouldn't it be faster have the user link directly to a file rather than a script that will serve the file?

    As a side note, the group I am doing this for (who will remain nameless for legal reasons) is a non-profit group who couldn't afford to hire a really high-end Web development company, so I'm doing it for them for next to nothing. I think that's a mistake I'm not likely to repeat.

      You'll have to store some sort of ID with the file, perhaps the supplied file name from the upload routine.

      Since it goes straight into the database (be sure to quote it or use a DBI placeholder), you don't have to worry about the security implications of passing user data through the shell as part of an open statement.

      You can also use a simple SQL SELECT statement to pull the file out of the database: SELECT file from saved_files WHERE name = ?;

      Can you provide a little more detail on what exactly the sheets are composed of, etc? Are they HTML themselves, or are they fields from an HTML form? Does the data need to be reformatted on output, or is the data self-describing (akin to dumping HTML from a file straight out to a browser)?

      Here's an instance. If you were storing GIF or JPG files in a database, and you wanted to display them in a 2 x 2 table, for each TD in the table you have an IMG SRC tag that read something like IMG SRC="myscript_getimage.pl?item=x", where 'x' was some unique identifier (like an auto_increment field) from the database.

      The 'myscript_getimage.pl' would then be executed for each image to be displayed, and kick out the JPG or GIF.

      Conversely, if the file is some sort of data file that goes into another program (say TestGrader.tgf), when they clicked a link or a field in a table (use that JavaScript! Use that OnClick event!), you could download the file to their machine (via the save file dialog, like you most likely get when you download a .ZIP file). Or, you can register a handler for it, and open the application directly, much like downloading a .PDF typically does.

      There are lots of ways to manage data like this. If you can clarify your goals for the monks, I imagine we can come up with a suggestion to guide your down a more optimial path of elightenment. (Heh. Or you could see if you could get two dozen monks to collaborate, and turn your project into OpenSource).

      --Chris
        Here's the scoop: I'm doing this for a non-profit organization that doesn't have the money to pay me enough for it and can't afford training. As a result, I have to make this so generic that ANYONE who needs it can use it.

        It has tons of JavaScript1.1 (much more than I want) to make sure they can't submit data improperly. What the users (instructors) do is use a template I've made to create instructional plans that for their classrooms that meet rigorous state standards. Attached to the plans are activity sheets (the sheets I mentioned). These might be images, Word documents, spreadsheets, or other things to hand to students to supplement the lesson.

        I have little control over the format they send these things in, so I have the problem that instructors on a PC won't necessarily be able to use activity sheets created on a Macintosh (but I let them know the platform they were uploaded from, which is probably the platform they were created on).

        Other instructors can then view instructional plans and adapt them for their classrooms, thus allowing them to save time developing them and devoting more time to the students. I hope this answers your questions regarding what I'm doing.

        Re: OpenSource. My company is strongly pro-OpenSource and we'll probably give the code to whoever might want it when we're finished, but I question the ethics (and legality) of my asking for free collaborators for a project that I'm getting paid for -- albeit I'm not getting paid much.

      Personally, I'm inclined not to store the actual file contents in the database, because I feel that it complicates matters. My preference is to store the path to the file in the database, as well as any metadata about that file, then just store the file in the filesystem itself.

      When you want to serve a file (I assume you mean over an HTTP connection), you can just set up a script to grab the filename from the db, grab the metadata, etc. I'm pretty sure you could then just issue the correct Content-Type, then open up the file and spit it out to the browser. Be sure to use binmode if you're running on a Windows machine (or one that makes a distinction between text and binary files).

        Well, I can skip binmode as the scripts are running on a Linux box. I'm not sure what you mean by issuing the correct Content-Type. Are you meaning that I need to do that if I serve it from the database? By saving the file directly to a Web-accessible directory, I thought the server would handle that when the users clicked on a link to the file.

        Which raises another question: How do I determine the content-type of an uploaded file? Obviously it's not a simple case of checking the extension (since Macs don't use them).