La12 has asked for the wisdom of the Perl Monks concerning the following question:

Hi everyone,

I need some direction.

The Environment:
RedHat linux, Apache, MySql, Perl

The Project:
I need to allow people to upload the data from an excel spreadsheet (1 column by 7,000 rows) into their own table in an MySql database. The user will be uploading and replacing the data everday.

Simplicity for the end user is key. I would like to create an HTML form in which they can cut from Excel and paste into a TextArea and then submit it to MySql using "post". (I plan on using the cgi and dbi perl modules). But, I believe "post" and "get" have limits as to how much data it can send. I'm guessing my other option is to have people upload the excel file (again via cgi) and then have the server open the file and upload it into MySql. Exporting the excel data into a text only file beforehand will only complicate things for the end use. So I would like to avoid that step.

The Questions:
1) Am I going about this the wrong way? Should I not use a form and/or upload the excel file?
2) Can someone tell me the file size limitations of "post" and "get" ?
3) Am I nuts for trying to upload 7,000 rows via http?


Any guidance is greatly appreciated.

Thanks.

  • Comment on Uploading Excel data into MySQL via a browser

Replies are listed 'Best First'.
(Ovid) Re: Uploading Excel data into MySQL via a browser
by Ovid (Cardinal) on Sep 28, 2001 at 20:01 UTC

    We do this here at work and we've managed to have a ton of problems handling Excel spreadsheets. What we finally did we have our users export the spreadsheet as a CSV file (one of the options from "File->Save As". Then, the users would Zip the file and upload the zipped version. This is considerably smaller than uploading a spread sheet.

    You do not want to use GET as you will almost certainly get a "414 Request URL Too Long" error. This is configurable by your Web server, but it would still be ridiculous (and I'd just love to see your Web access logs with that one :). You want to use CGI.pm to upload the data. To unzip it, Archive::Zip works really well. If you have problems with Archive::Zip truncating data, make sure that you have the latest version of Compress::Zlib installed as this is a common problem.

    To process the contents of the CSV file, you can try Text::CSV or any of another handful of modules to deal with this.

    I would consider uploading the files to the server and then having another process deal with them. You're dealing with large files and the script might timeout on the user. This is Not Good. Be sure you have the user's email address so that you can send them feedback on the success or failure of incorporating the data into the database. Another reason you might want to consider having another process handle adding the data to the database is that it will be much easier to control the program flow. You never know when someone is going to click on the "Stumbit" button.

    With uploads this large, I would seriously rethink a MySQL decision unless you have a version that supports atomic transactions. If you need to roll back the upload after processing the 6000th line, you're going to hate life if you can't issue a simple $dbh->rollback command.

    Be sure to test the heck out of this. Be sure to insert gobs of dummy data into the system. Use bad dates, quote numbers, stuff in an extra field or 17. Short some of your records by a few fields, etc.

    Also, I would consider having them FTP the data instead. Periodically check the FTP directory and process (and subsequently delete) files that are in there. Users wouldn't have to worry about waiting for their browser and it's an easier problem for you to code around.

    Basic security: make sure you use placeholders with your DBI statements. Since you'll likely have repetitive statements, this means you script will operate faster. It also ensures that people can't "accidentally" try to insert dangerous characters in the spreadsheet that might break your script. Make sure that you delete the files after you are done with them (don't want to fill up the harddrive with files this large) or, better yet, figure out an archival scheme. Also, limit the max total directory size of the directory these files are uploaded. No sense in having 30 people upload 50 meg files and having your server crash from lack of disk space.

    Cheers,
    Ovid

    Vote for paco!

    Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

      If MySQL minus transactions is what's in production, an easy way around the lack of transactions would be to make a temp table to process that data in. If it works, lock them both and update.

      -Lee

      "To be civilized is to deny one's nature."
Re: Uploading Excel data into MySQL via a browser
by davorg (Chancellor) on Sep 28, 2001 at 19:43 UTC
Re: Uploading Excel data into MySQL via a browser
by miyagawa (Chaplain) on Sep 28, 2001 at 19:47 UTC
Re: Uploading Excel data into MySQL via a browser
by IDStewart (Acolyte) on Sep 28, 2001 at 19:55 UTC
    The Questions:
    1) Am I going about this the wrong way? Should I not use a form and/or upload the excel file?

    Well, not the necessarily the wrong way (TMTOWTDI and all that), here's an alternative approach:

    1. Provide an upload widget for the user to upload the Excel file
    2. Spawn a seperate process that will parse the file (see Spreadsheet::ParseExcel) and update the MySQL database
    3. return a message to the user letting them know their request is being processed
    HTH,
    Ian
Re: Uploading Excel data into MySQL via a browser
by tune (Curate) on Sep 28, 2001 at 19:46 UTC
    Why don't you ask your users to save their spreadsheet into CSV (comma-delimited), or tab-delimited file, and have it uploaded to your script? Upload is only limited by you, (maximum upload size), or the server time-out.

    --
    tune

Re: Uploading Excel data into MySQL via a browser
by ajt (Prior) on Sep 28, 2001 at 21:52 UTC
    Many sound points.

    With CGI.pm you can upload a file quite easily. Over a LAN it will be reasonably quick. As people have pointed out exporting Excel to another format and zipping the file will make the post go even faster.

    Once you have the file on the server you can decompress it and stick it into mySQL. Using the suggested modules.

    Given that an Excel file may take a while to upload, extract and store in mySQL, I'd also suggest copious logging, so you can track down what's going on if people complain. As monks have mentioned you need to give positive feedback to the broser asap, to prevent multiple submits and user frustration.

    To avoid multi-click behaviour of users, merlyn has a recent WebTechniques article about one click processing, which is well worth reading, even if you don't use it in this project.

Re: Uploading Excel data into MySQL via a browser
by mandog (Curate) on Sep 28, 2001 at 19:53 UTC
    Maybe one of the CPAN modules on Excel might have what you need.

    If you are using CGI.pm you can pretty easily create a file upload field. to upload the file.

    HTML form text fields are limited to 32K in size, so it may be hard to paste 7000 rows there.



    --mandog

Re: Uploading Excel data into MySQL via a browser
by Ven'Tatsu (Deacon) on Sep 28, 2001 at 22:08 UTC
    GET is limeted by the lenth of URLs alowed by the client and server, which ever is smaller.
    POST is not limeted on any configuration I have seen.

    A problem will be transfer and processing times. If each line is 16 characters on average then you will be sending 123KB (more when characters that have to be % encoded are used) Depending on the client connection that could take 30 seconds to a minute on a modem. After the server gets the data I assume it will have to 7000 update/insert SQL statments, that could take a decent lenght of time.
    This could easily add up to a few minutes of time, not a lot from the programers perspective but it's a long time to sit and wait from a users perspective.
      While I encourage you to follow the advice of those advocating a file upload rather than a form field paste, you might be interested to know that (at least in IE5) you can change the enctype attribute for a form to 'text/plain' and it will send the server un-escaped text from your form rather than the normal 'x-form/urlencoded' (something like that) encoding type.
Re: Uploading Excel data into MySQL via a browser
by CubicSpline (Friar) on Sep 28, 2001 at 19:48 UTC
    Why not set up an application on the client machine that connects to the db and parses the file and loads the data in? Is it a distribution problem? (lots of clients) I would avoid HTTP if I were you, especially in the stead of using straight ODBC and the Perl DBI.
Re: Uploading Excel data into MySQL via a browser
by mattr (Curate) on Sep 29, 2001 at 12:27 UTC
    You will need to have the Excel file in your hands to try when things go wrong (which they will). I'm convinced Excel can't be trusted to export good CSV, though Access seems to work. You could also import from Access directly, if you don't need excel. I'd parse and not use CSV at any time.

    Some things that could foobar the parsing are vertical tabs and two byte fonts like Japanese (both of which have caused me trouble in the past), curly quotes and ticks, cr/lf (no macs?), and maybe new versions of Excel and embedded binary, for starters. It pays to limit allowed characters and flag anything wierd (any Europeans using commas instead of decimal points?) and so on.

    So get the Excel, parse it and email them if it looks wierd. You could keep them waiting, though it takes time during which you have to keep the browser from timing out, especially if you use a tool I once tried to turn Excel into HTML (think lots of time, cpu, and memory for a large file). Maybe you want to provide a utility so they can re-upload a given day, and see if they missed one.

    HTTP POST is fine (set a limit in your script), though there may be other options. For example you could write a Perl program which runs on their computer and ftp's it up automatically. Maybe someone might be interested in taking a swing at an activex/COM object which I won't even take a swing at.. There is this node. It seems something interesting would be possible with the perlctrl or perlcom parts of activestates's perl dev kit (but I don't own it).

    I mention these because something always goes wrong. So even with a form, you can upload the wrong file, the filename is bad, it's really a shortcut, there was a holiday and they forgot, the computer's clock went nuts, etc. etc. If you can somehow get identification data into their table data from the beginning, you will be happier.

Re: Uploading Excel data into MySQL via a browser
by La12 (Sexton) on Oct 02, 2001 at 21:36 UTC
    Thanks everyone. All your comments are greatly appreciated.