Hi BrowserUk,
Please note: I didn't say it could not be solved with a cvs module, only that the don't really help.
I was disagreeing mostly with what appeared to be your main point, "his data isn't CSV!" I also disagree that a CSV module doesn't help, since Text::CSV gives you the handling of quoted strings. I'm sure you know all the other usual arguments as to why using a module is often better than rolling your own, but just to highlight one: it does give more flexibility if the input happens to vary (e.g. if one of the fields is empty or another field happens to be a quoted string).
For example, you've resorted to split for the embedded field. Why it it accesptable to do so there, but not for the rest?
I'd say a simple split isn't really appropriate for the rest of the line because it doesn't have all the power of the module. However, you've still got a good point, using a simple split for the third column does make assumptions about the input format (I've updated my node accordingly). If I wanted total consistency I could use another instance of Text::CSV, but the line between what's overkill and what isn't has to be drawn somewhere ;-)
But, your code also only solves half the problem
Yes, I admittedly did skip that part of the question; I felt that the OP's problems with parsing CSV were more important. But then again, your code doesn't solve that part very generically ;-P
... the use of a csv module does nothing to assist in that. In fact it effectively denies access to some information that could be used to assist in the production of the output.
True, one does lose some info on the input (IIRC the physical line number in the input file in the case of newlines embedded in fields; maybe there's more I'm forgetting at the moment). But I also think the solution for the problem of adding either a comma or a semicolon at the end of the line is the same whether I manually parse the rows or whether I use a module to help. Just a quick, inelegant idea:
my $rownum = 1; while ( my $data = somehow_parse_input_line() ) { print ",\n" unless $rownum==1; print $data; } continue { $rownum++ } print ";\n";
You unconditionally attempt to split the 3rd field regardless of whether it requires it.
Unless we're talking about performance, I don't think unconditionally splitting the third field hurts; although as I said above it could certainly be solved differently, and that'd certainly be necessary if the input varies.
Regards,
-- Hauke D
In reply to Re^5: splitting csv file and saving data
by haukex
in thread splitting csv file and saving data
by Ganesh Bharadwaj1
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |