The nature of unpack means it supports variable width final columns, they will show up as an empty string by default or undef if you enable the option.
The lack of headers is an issue, the biggest time saver of using the module is not having to count the chars for the unpack template by hand -- I might be able to make this work with heuristics if you send in the whole table though -- I'll have to look at it.
The stripping piece is all done by the user, we don't touch the file handle. You send to me the line to process.
| [reply] |
The nature of unpack means it supports variable width final columns, they will show up as an empty string by default or undef if you enable the option.
Sorry -- I couldn't test the code (don't have 5.10, nor Moose installed), and didn't notice that you correctly handled that logic in your code.
The lack of headers is an issue, the biggest time saver of using the module is not having to count the chars for the unpack template by hand -- I might be able to make this work with heuristics if you send in the whole table though -- I'll have to look at it.
See BrowserUK's code that does just that -- it determines where in the data there are consistently spaces, and then uses that as the column boundaries. Obviously, that's not true in all cases, as I often have to process lines such as:
... where there are actually 7 columns, 3 of which are 1 character wide, and hold boolean information (it's actually a quality flag for the preceding column). The only good solution I've had to process this in an automated way (without directly counting columns) is to create some sort of an input mask, and parse that .. so I might generate something such as:
... where the only thing significant is that the character changes ... you could use specific characters to signal different data types (ie, is it handled as a string, numeric, boolean?). Of course, you'd need two characters for each one, so those times when two of the same field abut without whitespace between them.
The stripping piece is all done by the user, we don't touch the file handle. You send to me the line to process.
That's one approach -- but trust me in this -- of the type of data I process, this happens so often that I just want to pass in the number of lines of header/footers, or a regex to denote where to start/stop. If I have to go to the trouble of wrapping your code to handle this rather elementary task, I'm just not going to use it -- I'm going to use my own, as I don't see any real advantage otherwise -- it's not worth forcing people to update Perl and install Moose just to do this sort of work. If you're going to bill your module as ' The most significant module to ever grace cpan', I'd have expected a little bit more.
(the parsers I'm writing are for people to parse scientific catalogs, to keep SQL databases synced up with the authoritative records, and I try to keep the necessary install to a minimum ... I don't even require DBD -- I generate CSV files and the necessary load routines for the database ... but that's also a performance tuning issue. For anyone who's going to actually stay through the end of the SPD / AGU joint session, I'm giving a talk late on Friday on the work, and although I'll touch on the issues with parsing, my bigger issue is assigning semantic meanings to the columns, so that catalogs can be cross-correlated in a meaningful way.)
| [reply] [d/l] [select] |
Thanks for giving me something to think about! I will update my module if I can think of a more elegant way to help you out here.
update: v0.03 supports heuristics => \@lines which is essentially a rip off of BrowserUK's code, I just uped it for the first time last night. I'm still working on a solution for your */bool stuff.
| [reply] [d/l] |