Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

So let's say I've got a CGI application in which a user may upload a large file (let's say a few hundred MB). If all I needed to do with this file was write it to disk, it's relatively easy to read a few chunks and then write a few chunks so that the entire file doesn't get buffered in to memory first. But instead, I need to store that file in a MySQL database table. Is there a good way to do this without buffering the entire upload into memory first and then executing one big INSERT statement?

Replies are listed 'Best First'.
Re: Writing data in chunks to a DB handle
by BrowserUk (Patriarch) on Aug 16, 2008 at 05:23 UTC

    It depends what you mean by I need to store that file in a MySQL database table.?

    1. If you mean store the entire file as a single opaque entity, then writing it to the file system and then using the DBs bulk insertion tool is the way to go.

      Although I agree with others that there is little or no benefit and many downsides to storing BLOBs in relational DBs.

    2. If you mean that the file contains data as rows or records that need to be inserted into the table, then the best method will depend upon what format the data is in:
      • If it is some easily parable form, eg. CSV or TSV, then again the write to file and bulk insert route is best.
      • However, if the data needs more complex parsing/processing before the records are in a suitable form for input, then investigate CGI's upload() function which returns a filehandle from which you can read the data in chunks.

        Alternatively, look at the CGI::upload_hook() function that allows you to register a callback that will be called with successive chunks of the file as it is uploaded.

        See the CGI docs for more.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Writing data in chunks to a DB handle
by kyle (Abbot) on Aug 16, 2008 at 03:39 UTC

    I don't know of a way. Perhaps you could write the file to disk as it's coming in and then use the command line MySQL client to do the insert. Perhaps you could put chunks into a temporary table and then have the database assemble them when finished. In any case, I'd be pretty strongly tempted to write such files to the filesystem (which is designed for this kind of thing) and just have a record of it in the database (which is designed for that kind of thing).

Re: Writing data in chunks to a DB handle
by betterworld (Curate) on Aug 16, 2008 at 03:52 UTC

    Maybe write the incoming data into a temporary file then use MySQL's LOAD DATA to get the stuff from the file into your database.

    By default, CGI.pm does write the uploaded data into a temporary file and returns a handle to that file. However I don't know a nice way to get the name of the temporary file, which you would need to pass to LOAD DATA.

Re: Writing data in chunks to a DB handle
by Corion (Patriarch) on Aug 16, 2008 at 08:52 UTC

    The idea of doing a streaming upload into MySQL is a bad idea, because too many things can fail during the upload. Write the uploaded file to a temporary space and then use the MySQL bulk loading facility to import the data in a transaction.

    If you're really hell-bent on doing it as a streaming upload, you can consider creating a pipe via mkpipe, writing to that pipe from your CGI upload process and having MySQL load from that pipe instead of a file.

Re: Writing data in chunks to a DB handle
by samtregar (Abbot) on Aug 16, 2008 at 16:42 UTC
    Dealing with file uploads in a web environment is a real pain. I've certainly been there. Everything else the user sends you is so well behaved - just a bunch of text you can stick in a table and be done with it! It's very tempting to think that you can get away with treating file uploads the same way, but I think it's a mistake to go that way.

    MySQL is not a distributed file-system and it's not tuned to handle huge files like this. This problem inserting the file without loading it into memory is really just the start of your woes. How will you get the file out again? How will you do backups of your DB now that mysqldump is producing dumps that contain every file any user ever uploaded? Will you re-invent directories at some point? Permissions too? What will you do when you want to process the uploaded files with an external tool (resizing images, transcoding video, etc)?

    It's not too late to setup an NFS server! It's not as hard as you probably think, and once you have one setup you'll find other uses for it (configuration, shared code installs, etc) and stop leaning on your DB so hard.

    -sam