Unpack or substr to create CSV?

johnmck has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Unpack or substr to create CSV? by toolic (Bishop) on May 02, 2015 at 16:23 UTC
Since the fields never change length it would seem that unpack is a better choice than substr, is that correct? Perhaps. Read perlpacktut, which compares the various approaches. Here is an example of using unpack to parse each line into an array: `use warnings; use strict; while (<DATA>) { chomp; my @cols = unpack 'A1A15A14A16', $_; print join(',', @cols), "\n"; } __DATA__ C4432882490H019000020150211ESL6690 0H2015PC C4833076550HC0P0000201412093J46651 0H2015DX C6033106980H057130020150323FRE7602 0H2015PC C663160140MT007015G20141124274847A MT2015PC` [download]	[reply] [d/l]
Re: Unpack or substr to create CSV? by hdb (Monsignor) on May 02, 2015 at 16:16 UTC
I only count 43 per line, could this be a source of your problems? There are many ways to do this, for example using regular expressions: `use strict; use warnings; while(<DATA>){ s/(.)(.{15})(.{14})(.{13})/$1,$2,$3,$4/; print; } __DATA__ 0123456789012345678901234567890123456789012 C4432882490H019000020150211ESL6690 0H2015PC C4833076550HC0P0000201412093J46651 0H2015DX C6033106980H057130020150323FRE7602 0H2015PC C663160140MT007015G20141124274847A MT2015PC` [download]	[reply] [d/l]
Re^2: Unpack or substr to create CSV? by AnomalousMonk (Archbishop) on May 02, 2015 at 17:17 UTC
Not to mention that it's really easy to add data validation with this approach: `use strict; use warnings; while(<DATA>){ s/\A(.)(.{15})(.{14})(.{13})\Z/$1,$2,$3,$4/ or die "bad record: '$_'"; print; } __DATA__ 0123456789012345678901234567890123456789012 C4432882490H019000020150211ESL6690 0H2015PC C4833076550HC0P0000201412093J46651 0H2015DX C6033106980H057130020150323FRE7602 0H2015PC C663160140MT007015G20141124274847A MT2015PC` [download]	[reply] [d/l]
Re: Unpack or substr to create CSV? by AnomalousMonk (Archbishop) on May 02, 2015 at 16:48 UTC
... a better choice ... Here's the standard `<rant>`: What the heck is your criterion for "better"? I would gravitate to an unpack solution for fixed-width records, but might your maintainer better understand substr? If so, `substr` would be better. Are you concerned about speed? For such a small dataset, I doubt there would be any significant difference between the three approaches mentioned so far in this thread, but the only way to tell is to Benchmark. (Update: Do you want to support data validation at all?) And so on... `</rant>` Give a man a fish: `<%-(-(-(-<`	[reply] [d/l] [select]
Re^2: Unpack or substr to create CSV? by Laurent_R (Canon) on May 02, 2015 at 18:59 UTC
I agree with AnomalousMonk, for such a small dataset, any approach is probably good enough. Just use the one you understand best and that your maintainer is likely to understand best. I personally would choose `substr` because anytime I use `unpack`, I need to go through the documentation again, and `substr` is marginally better than a regex. But a regex would do just about as well for this data size. Je suis Charlie.	[reply] [d/l] [select]
Re^3: Unpack or substr to create CSV? by AnomalousMonk (Archbishop) on May 02, 2015 at 20:11 UTC
After posting the above, I realized that a regex approach might give you data validation, if this was of any concern, almost for free, so I think now that I might incline in this direction. But again, there are too many unstated conditions and requirements to allow more than a hand-waving consideration of alternatives, although this may be valuable to johnmck. Give a man a fish: `<%-(-(-(-<`	[reply] [d/l]
Re^4: Unpack or substr to create CSV? by Laurent_R (Canon) on May 02, 2015 at 20:31 UTC
Re: Unpack or substr to create CSV? by Laurent_R (Canon) on May 02, 2015 at 20:23 UTC
Hi, as I already said in another post on your thread, performance is probably completely irrelevant for the small dataset your are talking about. However, just in case you are interested, I ran a detailed benchmark on a very similar problem a bit less that a year and a half ago. The results are here: Re: Performance problems on splitting long strings. You'll see that `unpack` won the race, but `substr` wasn't that far behind. It did make some difference to me, however, because I was running the processing of two 6-GB files, with the long string to be split representing at least 75% to 80% of the data volume. This was just for your information. Again, I don't think you should care at all about that for your low data volumes. Je suis Charlie.	[reply] [d/l] [select]
Re: Unpack or substr to create CSV? by Tux (Canon) on May 03, 2015 at 10:50 UTC
`$ perl -MText::CSV_XS=csv -we'csv (in => sub {[ unpack "AA15A14A*", <> + // exit ]})' < test.txt C,4432882490H0190,00020150211ESL,"6690 0H2015PC" C,4833076550HC0P0,000201412093J4,"6651 0H2015DX" C,6033106980H0571,30020150323FRE,"7602 0H2015PC" C,663160140MT0070,15G20141124274,"847A MT2015PC"` [download] Enjoy, Have FUN! H.Merijn	[reply] [d/l]
Re: Unpack or substr to create CSV? by karlgoethebier (Abbot) on May 03, 2015 at 16:20 UTC
AnomalousMonk wrote: "...but might your maintainer understand substr better?" I don't know but as no one provided a solution that uses substr, i wrote one: `#!/usr/bin/env perl use strict; use warnings; my @pairs = ( [ 0, 1 ], [ 1, 15 ], [ 16, 14 ], [ 30, 4 ], [ 35, 8 ] ); while ( my $line = <DATA> ) { for my $pair (@pairs) { my $index = $pair->[0]; my $offset = $pair->[1]; print substr $line, $index, $offset; print qq( ); } print qq(\n); } __DATA__ C4432882490H019000020150211ESL6690 0H2015PC C4833076550HC0P0000201412093J46651 0H2015DX C6033106980H057130020150323FRE7602 0H2015PC C663160140MT007015G20141124274847A MT2015PC` [download] Output: `karls-mac-mini:monks karl$ ./substring.pl C 4432882490H0190 00020150211ESL 6690 0H2015PC C 4833076550HC0P0 000201412093J4 6651 0H2015DX C 6033106980H0571 30020150323FRE 7602 0H2015PC C 663160140MT0070 15G20141124274 847A MT2015PC` [download] Regards, Karl �The Crux of the Biscuit is the Apostrophe�	[reply] [d/l] [select]