in reply to Re: Suggestions to make this code more Perlish
in thread Suggestions to make this code more Perlish

That's a good solution but it has a couple of rough edges.

I don't think the first line needs to be treated separately. You could remove these four lines:

my $first_line = <DATA>; chomp $first_line; $first_line =~ s/,/chr(31)/eg; print $first_line, chr(30);

Your join is joining the fields and the record separator. Just add parentheses to

print join chr(31), @fields, chr(30);

like this

print join(chr(31), @fields), chr(30);

That way, the field separators will just separate the fields. :-)

You're also not adequately handling a quoted field at the start of a record or two quoted fields adjacent to each other. Here's two examples from your output (I think these are the only ones):

... "Bonaire, Sint Eustatius and Saba|"Charissa, Lana, Liberty, Quail|4451 +7|18| ... "Virgin Islands, British|"Otto, Macon, Caldwell, Sasha|87676|49| ...

Here's the matching lines from the input:

... "Bonaire, Sint Eustatius and Saba","Charissa, Lana, Liberty, Quail",44 +517,18 ... "Virgin Islands, British","Otto, Macon, Caldwell, Sasha",87676,49 ...

-- Ken

Replies are listed 'Best First'.
Re^3: Suggestions to make this code more Perlish
by Laurent_R (Canon) on Mar 30, 2014 at 10:06 UTC
    Thanks for your comments, Ken. Yes, you are right, this is a 10-minutes solution, certainly not a polished one.

    I needed to process the first line differently because of the way I chose to process the other lines, which would not work for the first one, but it is certainly possible to find another way to process the lines that would also work for the first one. However, when I have a header line that needs to be processed differently than the rest of the file, I often prefer to process it before starting to loop on the rest of the file, because the algorithm is then simpler (and often faster, which matters if the file is large).

    You are right on the "join" line, it adds a field separator at the end of the records. That did not shock me, but it is indeed different from the output produced by the code in the original post. Adding parens (as per your proposal) solves the issue.

    I had not even seen that there were two "irregular" lines with quoted fields at the beginning of the records in the input data, and that of course is a serious problem because it probably means that the whole algorithm has to be modified. BTW, this is a good example of why using a module such as Text::CSV is often better than doing one's own solution.

      "BTW, this is a good example of why using a module such as Text::CSV is often better than doing one's own solution."

      Yes, I absolutely agree. Beyond being an interesting academic exercise, reinventing this particular wheel has little merit.

      Furthermore, while solutions have been coded for a very specific input, they'll need to be recoded for escaped quotes, whitespace around field separators and anything else that Text::CSV has already taken into consideration but which we haven't catered for yet.

      -- Ken