Hi admiraln,
Take a look at Spreadsheet::Write on CPAN. It won't help you remove the printer control characters, but it does give you a nice, easy interface to create spreadsheets (Excel, OpenOffice, CSV, etc.) as well as individual sheets within the spreadsheet document.
Does the final output have to be a spreadsheet? If not, it might be easier to convert .prn file you're already creating into a PDF for distribution. Just an idea.
HTH,
/Larry
| [reply] [d/l] |
| [reply] |
If you are starting with .prn files, then I assume (heh) that you are dealing with page images in a monospace font.
If that's fair, then you are most likely dealing with some kind of fixed format -- a couple lines of page header, same thing for the footer, and in the middle (the rest) is all of the good stuff, in columns.
Unless the columns are wrapped (I handled that for my first Perl contract -- it was fun, back in 1998), it's simple to split the columns up, perhaps using a regular expression with fixed width groups.
So, some pseudo-code to handle the .prn file would be something like this:
- For each document,
- For each page,
- Strip off the page header
- For each data line,
- Separate the line into individual fields, and store them in an array.
- Strip off the page footer
Once that's done, you can use Spreadsheet::WriteExcel to write it out to an Excel spreadsheet. Alternatively, try Text::CSV just to make a CSV out of it -- sometimes it's better to take small steps.
So, come back and let us know how it all worked out.
Alex / talexb / Toronto
"Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds
| [reply] |