If you're feeding everything into a database anyway, I'd most likely write seperate scripts to grab the information out of each type of file, and put the information into the database. Once everything's in the database in a consistent format, it's much, much easier to generate your reports.

(basically, going with the philosophy that lots of little scripts with a limited scope of work are easier to write and debug than one big script that tries to do everything)

As for data munging, the book might be a good start. Depending on the number of records involved, I'll typically just start processing tem using various patterns that seem to be in the files, and have it write out the successes to one file, and the rejects to another ... if the successes look good, then I'll place them into the database (or whatever the plan was), and look for more patterns in the rejected records.

From what you describe, the csv and excel files should just be a matter of mapping the fields to the database fields (you might have to do some cleaning if there are enumerations or foreign key constraints, though), so the only problem might be the text file, and that's all just a matter of taking small steps, like what I described above.

Now, if this is something that's going to repeat itself (eg, you're going to get an excel sheet once a week from one department, and a cvs from another, and you have to generate reports for your boss as soon as you get them), well, then you start getting into workflow issues -- You just work pretty much like I suggested before, but you just have the parsing scripts store the failed matched in memory, and have the program continue to process them until it's done. (or it gives up, in which case, it writes out a log that you can go through, so still have some manual work, but it's not as bad as it might be)


In reply to Re: how to consolidate xls/csv/text into one report? by jhourcle
in thread how to consolidate xls/csv/text into one report? by tphyahoo

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.