in reply to how to consolidate xls/csv/text into one report?

If you're feeding everything into a database anyway, I'd most likely write seperate scripts to grab the information out of each type of file, and put the information into the database. Once everything's in the database in a consistent format, it's much, much easier to generate your reports.

(basically, going with the philosophy that lots of little scripts with a limited scope of work are easier to write and debug than one big script that tries to do everything)

As for data munging, the book might be a good start. Depending on the number of records involved, I'll typically just start processing tem using various patterns that seem to be in the files, and have it write out the successes to one file, and the rejects to another ... if the successes look good, then I'll place them into the database (or whatever the plan was), and look for more patterns in the rejected records.

From what you describe, the csv and excel files should just be a matter of mapping the fields to the database fields (you might have to do some cleaning if there are enumerations or foreign key constraints, though), so the only problem might be the text file, and that's all just a matter of taking small steps, like what I described above.

Now, if this is something that's going to repeat itself (eg, you're going to get an excel sheet once a week from one department, and a cvs from another, and you have to generate reports for your boss as soon as you get them), well, then you start getting into workflow issues -- You just work pretty much like I suggested before, but you just have the parsing scripts store the failed matched in memory, and have the program continue to process them until it's done. (or it gives up, in which case, it writes out a log that you can go through, so still have some manual work, but it's not as bad as it might be)

  • Comment on Re: how to consolidate xls/csv/text into one report?