I decided to go with this solution. The columns are fixed width, most likely tabs converted into spaces, but since there are no examples, in the file I'm parsing, of two fields being separated by less than two spaces, I went with splitting. Also, this worked better, since this would retain all the funny characters that a regexp might miss, like 'Æ' and such.
A link to the complete file I'm parsing.
Just out of curiosity. Say I have several thousand of these files to parse, which would be faster, splitting, regexp or the unpack solution?
Was it ok to post my reply here? I'm not up on perlmonk posting etiquette. Moderators moderate. | [reply] |
Probably unpack. Only a benchmark will tell for sure, though. Even so, unless you have to do this so many times that it actually matters, you shouldn't care. Readability and maintainability comes first; programmer time is much more expensive than computer time.
In your case I'd pick the unpack solution simply because it's the most clearly self documenting. The split solution does not convey all the assumptions about your input, even though it works.
Makeshifts last the longest.
| [reply] |