Re: Unpack or substr to create CSV?
by toolic (Bishop) on May 02, 2015 at 16:23 UTC
|
Since the fields never change length it would seem that unpack is a better choice than substr, is that correct?
Perhaps. Read perlpacktut, which compares the various approaches. Here is an example of using unpack to parse each line into an array:
use warnings;
use strict;
while (<DATA>) {
chomp;
my @cols = unpack 'A1A15A14A16', $_;
print join(',', @cols), "\n";
}
__DATA__
C4432882490H019000020150211ESL6690 0H2015PC
C4833076550HC0P0000201412093J46651 0H2015DX
C6033106980H057130020150323FRE7602 0H2015PC
C663160140MT007015G20141124274847A MT2015PC
| [reply] [d/l] |
Re: Unpack or substr to create CSV?
by hdb (Monsignor) on May 02, 2015 at 16:16 UTC
|
I only count 43 per line, could this be a source of your problems? There are many ways to do this, for example using regular expressions:
use strict;
use warnings;
while(<DATA>){
s/(.)(.{15})(.{14})(.{13})/$1,$2,$3,$4/;
print;
}
__DATA__
0123456789012345678901234567890123456789012
C4432882490H019000020150211ESL6690 0H2015PC
C4833076550HC0P0000201412093J46651 0H2015DX
C6033106980H057130020150323FRE7602 0H2015PC
C663160140MT007015G20141124274847A MT2015PC
| [reply] [d/l] |
|
|
use strict;
use warnings;
while(<DATA>){
s/\A(.)(.{15})(.{14})(.{13})\Z/$1,$2,$3,$4/
or die "bad record: '$_'";
print;
}
__DATA__
0123456789012345678901234567890123456789012
C4432882490H019000020150211ESL6690 0H2015PC
C4833076550HC0P0000201412093J46651 0H2015DX
C6033106980H057130020150323FRE7602 0H2015PC
C663160140MT007015G20141124274847A MT2015PC
| [reply] [d/l] |
Re: Unpack or substr to create CSV?
by AnomalousMonk (Archbishop) on May 02, 2015 at 16:48 UTC
|
... a better choice ...
Here's the standard <rant>: What the heck is your criterion for "better"? I would gravitate to an unpack solution for fixed-width records, but might your maintainer better understand substr? If so, substr would be better. Are you concerned about speed? For such a small dataset, I doubt there would be any significant difference between the three approaches mentioned so far in this thread, but the only way to tell is to Benchmark. (Update: Do you want to support data validation at all?) And so on... </rant>
Give a man a fish: <%-(-(-(-<
| [reply] [d/l] [select] |
|
|
I agree with AnomalousMonk, for such a small dataset, any approach is probably good enough. Just use the one you understand best and that your maintainer is likely to understand best. I personally would choose substr because anytime I use unpack, I need to go through the documentation again, and substr is marginally better than a regex. But a regex would do just about as well for this data size.
| [reply] [d/l] [select] |
|
|
After posting the above, I realized that a regex approach might give you data validation, if this was of any concern, almost for free, so I think now that I might incline in this direction. But again, there are too many unstated conditions and requirements to allow more than a hand-waving consideration of alternatives, although this may be valuable to johnmck.
Give a man a fish: <%-(-(-(-<
| [reply] [d/l] |
|
|
Re: Unpack or substr to create CSV?
by Laurent_R (Canon) on May 02, 2015 at 20:23 UTC
|
Hi,
as I already said in another post on your thread, performance is probably completely irrelevant for the small dataset your are talking about.
However, just in case you are interested, I ran a detailed benchmark on a very similar problem a bit less that a year and a half ago. The results are here: Re: Performance problems on splitting long strings. You'll see that unpack won the race, but substr wasn't that far behind.
It did make some difference to me, however, because I was running the processing of two 6-GB files, with the long string to be split representing at least 75% to 80% of the data volume.
This was just for your information. Again, I don't think you should care at all about that for your low data volumes.
| [reply] [d/l] [select] |
Re: Unpack or substr to create CSV?
by Tux (Canon) on May 03, 2015 at 10:50 UTC
|
$ perl -MText::CSV_XS=csv -we'csv (in => sub {[ unpack "AA15A14A*", <>
+ // exit ]})' < test.txt
C,4432882490H0190,00020150211ESL,"6690 0H2015PC"
C,4833076550HC0P0,000201412093J4,"6651 0H2015DX"
C,6033106980H0571,30020150323FRE,"7602 0H2015PC"
C,663160140MT0070,15G20141124274,"847A MT2015PC"
Enjoy, Have FUN! H.Merijn
| [reply] [d/l] |
Re: Unpack or substr to create CSV?
by karlgoethebier (Abbot) on May 03, 2015 at 16:20 UTC
|
AnomalousMonk wrote:
"...but might your maintainer understand substr better?"
I don't know but as no one provided a solution that uses substr, i wrote one:
#!/usr/bin/env perl
use strict;
use warnings;
my @pairs = ( [ 0, 1 ], [ 1, 15 ], [ 16, 14 ], [ 30, 4 ], [ 35, 8 ] );
while ( my $line = <DATA> ) {
for my $pair (@pairs) {
my $index = $pair->[0];
my $offset = $pair->[1];
print substr $line, $index, $offset;
print qq( );
}
print qq(\n);
}
__DATA__
C4432882490H019000020150211ESL6690 0H2015PC
C4833076550HC0P0000201412093J46651 0H2015DX
C6033106980H057130020150323FRE7602 0H2015PC
C663160140MT007015G20141124274847A MT2015PC
Output:
karls-mac-mini:monks karl$ ./substring.pl
C 4432882490H0190 00020150211ESL 6690 0H2015PC
C 4833076550HC0P0 000201412093J4 6651 0H2015DX
C 6033106980H0571 30020150323FRE 7602 0H2015PC
C 663160140MT0070 15G20141124274 847A MT2015PC
Regards, Karl
«The Crux of the Biscuit is the Apostrophe»
| [reply] [d/l] [select] |