in reply to Abritrary multiple spaces as delimiter

Well, assuming the words in the second and third element are seperated by only one space but the elements themselves are seperated by two or, it looks like split/\s\s+/ would do basically what you want. If thats not the case, you have basically have no way guarunteed to work. If I *had* to solve that problem, I'd probably try to extract the first element and the last two elements, then guess a lot for the middle ones.

Replies are listed 'Best First'.
Re: Re: Abritrary multiple spaces as delimiter
by Weisshaupt (Initiate) on Mar 15, 2004 at 09:40 UTC
    I decided to go with this solution. The columns are fixed width, most likely tabs converted into spaces, but since there are no examples, in the file I'm parsing, of two fields being separated by less than two spaces, I went with splitting. Also, this worked better, since this would retain all the funny characters that a regexp might miss, like 'Æ' and such.
    A link to the complete file I'm parsing.
    Just out of curiosity. Say I have several thousand of these files to parse, which would be faster, splitting, regexp or the unpack solution?

    Was it ok to post my reply here? I'm not up on perlmonk posting etiquette. Moderators moderate.

      Probably unpack. Only a benchmark will tell for sure, though. Even so, unless you have to do this so many times that it actually matters, you shouldn't care. Readability and maintainability comes first; programmer time is much more expensive than computer time.

      In your case I'd pick the unpack solution simply because it's the most clearly self documenting. The split solution does not convey all the assumptions about your input, even though it works.

      Makeshifts last the longest.