Google has failed me! Using pack and substr for fixed width file output

spudulike has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.

Re: Google has failed me! Using pack and substr for fixed width file output
by Athanasius (Archbishop) on Apr 08, 2014 at 11:57 UTC

Hello spudulike, and welcome to the Monastery!

Your first line, $line = pack ("A1170") ;, fills $line with a string of 1170 spaces. The next line, substr ($line, 0, 135) = " " ;, replaces the first 135 characters of $line with a single space, thereby shortening the string by 134 characters. Similarly, the next-to-last line shortens the string by a further 506 characters! What are those two lines intended to do?

I think you need to study the documentation for the substr function. :-)

Hope that helps,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

[reply]
[d/l]
[select]

Re^2: Google has failed me! Using pack and substr for fixed width file output

by spudulike (Novice) on Apr 08, 2014 at 12:52 UTC

Hi and it's great to be here,

That's perfect thanks and explains exactly what I'm seeing!

I spent most of the morning trawling the web trying to figure it out and now you've pointed it out, it's blindingly obvious. Head slapping moment :-)

Cheers, Jake (aka spudulike)

[reply]

Re: Google has failed me! Using pack and substr for fixed width file output
by roboticus (Chancellor) on Apr 08, 2014 at 14:39 UTC

spudulike:

Following your pattern, substr($line,190,507) looks like it should be substr($line,190,317).

For what it's worth, I deal with flat files quite a bit. Generally, though, I use pack and unpack instead of substr. That way, I can use the same format string for both packing and unpacking, kinda like this:

my $packfmt = "A135A11A6A6A20A8A3A978";
my $outline = pack $packfmt,
   " ", "DATE=${mdy}", ${mdy}, ${hms}, " ", "G1ADP",
   $CLIENT_ID, " ";

# Unpack, trailing blanks preserved
(undef, $DMDY, $mdy, $hms, undef, $CLIENT_ID)
            = unpack $packfmt, $outline;

# Unpack 1: remove trailing blanks (same format, but
# using map to trim the strings):
(undef, $DMDY, $mdy, $hms, undef, $CLIENT_ID)
            = map { s/\s+$//; $_ }
              unpack $packfmt, $outline;

# Unpack 2: remove trailing blanks, different format
my $parsefmt = lc($packfmt);
(undef, $DMDY, $mdy, $hms, undef, $CLIENT_ID)
            = unpack $parsefmt, $outline;
[download]

Note: I don't have perl on my work machine, so this is (a) from memory, and (b) quite possibly a bit broken.

...roboticus

When your only tool is a hammer, all problems look like your thumb.

[reply]
[d/l]
[select]

Re^2: Google has failed me! Using pack and substr for fixed width file output

by Anonymous Monk on Apr 08, 2014 at 20:32 UTC

Note: I don't have perl on my work machine, so this is (a) from memory, and (b) quite possibly a bit broken.

Its mostly correct :)

you can even take the HASH approach :) Re^2: hash_reference from file, Re: hash_reference from file, hash_reference from file ...

[reply]

Re^2: Google has failed me! Using pack and substr for fixed width file output

by spudulike (Novice) on Apr 09, 2014 at 06:15 UTC

Hi Roboticus,

It's good to know that there are other fixed width hackers out there :-)

I did use pack for a while and I did find it easier and faster to create a solution using it. But I had a problem explaining how to match up the fields with the pack format to the supporting teams. They couldn't seem to get their heads around it. So, for me, I got called less if I developed the solution using substr coz (I suppose) it was easier for the support bods to conceptualise the approach.

Thanks though, it's nice to know I'm <drum roll please> part of the <ahem> pack. Thank you very much, I'm here til Thursday - try the fish.

Cheers, Jake

[reply]

Re^3: Google has failed me! Using pack and substr for fixed width file output

by AnomalousMonk (Archbishop) on Apr 09, 2014 at 09:13 UTC

There are all kinds of interesting games you can play with pack, and it should be easy to encapsulate your solution in a function (with lotsa validation) so that maintainers only see data. See below.

Note that the @ pack template specifier fills with nulls, so if you want space-filling, you need the
$packed =~ tr{\000}{ };
statement. Note also that @ is absolute, so maybe play around and see, e.g., what happens if the date offset is set to 1 or 2.

c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le
"use constant ORDER => qw(date  client_id  info);
 ;;
 my %o_l = (
   date      => { qw(off  0  len 6  field 040814)             },
   client_id => { qw(off  6  len 8  field WayTooMuchClientID) },
   info      => { qw(off 20  len 7  field G1APD)              },
   );
 ;;
 my $fields =
   join '  ',
   map  qq{\@$o_l{$_}{off} A$o_l{$_}{len}},
   map  { exists $o_l{$_} or die qq{bad '$_'};  $_; }
   ORDER
   ;
 ;;
 my $packed = pack $fields, map $o_l{$_}{field}, ORDER;
 $packed =~ tr{\000}{ };
 print 'packed len ', length $packed, qq{ '$packed'};
 ;;
 my $total = 30;
 die 'truncation!' if length($packed) > $total;
 my $line = pack qq{A$total}, $packed;
 print 'total  len ', length($line), qq{ '$line'};
 ;;
 dd \$line;
"
packed len 27 '040814WayTooMu      G1APD  '
total  len 30 '040814WayTooMu      G1APD     '
\"040814WayTooMu      G1APD     "
[download]

[reply]
[d/l]
[select]

Re: Using pack and substr for fixed width file output ( substr outside of string )
by Anonymous Monk on Apr 08, 2014 at 11:24 UTC

$ perl -Mdiagnostics -e " $foo = sprintf q{%-24s}, 123; substr $foo, 2
+0, 4 , 4444; print $foo; "
123                 4444
[download]

[reply]
[d/l]

Re: Google has failed me! Using pack and substr for fixed width file output
by kcott (Archbishop) on Apr 09, 2014 at 00:37 UTC

G'day Jake,

Welcome to the monastery.

I see ++Athanasius has provided you with an answer to your question.

Whenever you see almost identical lines of code being repeated, such as you have here, consider abstracting that into a subroutine.

Apart from the physical effort of typing in all that code, you're having to do calculations at every step to determine each substring's offset.

Your computer can perform calculations faster and more accurately than you can: tell it to do this work. :-)

Any errors you make are likely to propogate to each successive line: a simple off-by-one error due to calculating the first character as being at postion 1 instead of position 0 would mean having to recalculate all but the first of your substr statements. Actually, it looks like that's exactly what you've done! [substr($line, 0, 135) addresses positions 0 to 134: what happened to substr($line, 135, 1)?]

Beyond the fact that hand-coding all of this is laborious, error-prone and may take several attempts to get it right, think of the maintenance issues. You have 6 characters assigned for your dates (11 chars for "DATE=${mdy}" and 6 chars for "${mdy}"): that looks like a MMDDYY format. What if you decided to make this more robust and avoid the so-called Y2K Bug by using a MMDDYYYY format: more calculations, more code modifications, and more potential errors.

The only information you should really need to concern yourself with is the size of the fields and the data to go into those fields. Let the computer worry about substring offsets, data size and whether padding is required.

Here's a simplified example of how you might go about this. Consider adding validation code, e.g. to check that you're not trying to put data into a field that's too small to hold it.

#!/usr/bin/env perl -l

use strict;
use warnings;

my ($mdy, $hms, $CLIENT_ID) = qw{MMDDYY HHMMSS 123};
my @fields = (
    [135, ''], [11, "DATE=$mdy"], [6, $mdy], [6, $hms],
    [3, ''], [17, ''], [8, 'G1ADP'], [3, $CLIENT_ID],
);
my $line = pack 'A510';
my $offset = 0;
add_field(\$line, \$offset, @$_) for @fields;

print 'Length of line = ', length $line;
print "Line = |$line|";

sub add_field {
    my ($line_ref, $offset_ref, $length, $data) = @_;

    substr($$line_ref, $$offset_ref, $length)
        = $data . ' ' x ($length - length $data);
    $$offset_ref += $length;
}
[download]

Output:

Length of line = 510
Line = |                                                              
+                                                                     
+    DATE=MMDDYYMMDDYYHHMMSS                    G1ADP   123           
+                                                                     
+                                                                     
+                                                                     
+                                                                     
+                                  |
[download]

-- Ken

[reply]
[d/l]
[select]

Re^2: Google has failed me! Using pack and substr for fixed width file output

by spudulike (Novice) on Apr 09, 2014 at 05:55 UTC

Hi Ken,

That's a really elegant solution to something I do almost every day especially as I was about to start coding up 119 fields for the data rows :-)

I may be forced to nick the above, thanks very much! Rest assured I shall put a "#Ken" in my code ;-)

Cheers, Jake

[reply]

Re^3: Google has failed me! Using pack and substr for fixed width file output

by kcott (Archbishop) on Apr 10, 2014 at 01:40 UTC

"That's a really elegant solution to something I do almost every day especially as I was about to start coding up 119 fields for the data rows :-)"

It sounds like you really need to take that abstraction one step further and make a module for all your scripts to use. You will already have saved yourself a lot of work by reducing all those hand-crafted substr statements to a single add_field() subroutine. What you want to avoid is copying that function into every new script you write.

Consider a situation where you find a problem with add_field() or you want to extend its functionality. If you've just pasted copies of add_field() into multiple scripts, you'll need to fix or modify every one of those; if you've used a module, you'll only need to make changes in one place.

Here's an example of how the code I posted earlier could be put into a module:

package PM::FixedWidthFile;

use strict;
use warnings;
use autodie;

use Exporter qw{import};
our @EXPORT_OK = qw{populate_file};

use Carp;

sub populate_file {
    my ($fh, $record_length, $field_data) = @_;

    for my $fields (@$field_data) {
        my $line = pack 'A' . $record_length;
        my $offset = 0;
        _add_field(\$line, \$offset, @$_) for @$fields;
        print {$fh} $line, "\n";
    }   

    return;
}

sub _add_field {
    my ($line_ref, $offset_ref, $length, $data, $r_align) = @_;

    if ($length < length $data) {
        croak "Data [$data] too large for field of length [$length]";
    } 

    my @dat_pad = ($data, ' ' x ($length - length $data));
    substr($$line_ref, $$offset_ref, $length)
        = join '' => @dat_pad[$r_align ? (1, 0) : (0, 1)];
    $$offset_ref += $length;

    return;
}   

1;

=head1 NAME

PM::FixedWidthFile - TODO (for Jake): module documentation in POD form
+at
[download]

add_field() is now the (pseudo-)private routine _add_field(). I've added an optional, boolean argument ($r_align) to right-align field data. There's also some validation code.

populate_file() is the public API. It creates a line of the desired length ($record_length) and calls _add_field() to populate the lines with the data from $field_data and outputs the lines to $fh (without having to know anything about what file is involved or whether it's writing to a new file or appending to an existing one).

In most cases, your scripts will need little more than:

use PM::FixedWidthFile qw{populate_file};
...
my $record_length = ...;
my $outfile = ...;
my $file_data = ...;
open my $fh, '>', $outfile;
populate_file($fh, $record_length, $file_data);
[download]

Here's an actual example with dummy test data:

#!/usr/bin/env perl

use strict;
use warnings;
use autodie;

use PM::FixedWidthFile qw{populate_file};

my $fixed_width_file_base = './pm_fixed_width_file.out_';

my $record_length = 32; # not including line terminator

my @multi_file_data = (
    [
        [ [10, ''], [10, 123], [10, 456] ],
    ],
    [
        [ [10, ''], [10, 123], [10, 456] ],
        [ [10, ''], [10, 123, 1], [10, 456] ],
        [ [10, ''], [10, 123], [10, 456], [2, 78] ],
    ],
    [
        [ [10, ''], [10, 123], [10, 456], [2, 78] ],
        [ [10, ''], [10, 123], [10, 456], [2, 789] ],
    ],
);

for my $i (0 .. $#multi_file_data) {
    my $outfile = $fixed_width_file_base . $i;
    print "Populating: $outfile\n";
    open my $fh, '>', $outfile;
    populate_file($fh, $record_length, $multi_file_data[$i]);
    close $fh;
    system qw{cat -vet}, $outfile;
    unlink $outfile; # my housekeeping
}
[download]

[In case you didn't know, cat -vet filename prints filename and shows various symbols to represent characters that you can't normally see or may have display problems (e.g. whitespace and characters outside the 7-bit ASCII range). The only symbol of interest here is the $ sign which represents a newline. See the cat manpage for more information.]

Output:

Populating: ./pm_fixed_width_file.out_0
          123       456         $
Populating: ./pm_fixed_width_file.out_1
          123       456         $
                 123456         $
          123       456       78$
Populating: ./pm_fixed_width_file.out_2
Data [789] too large for field of length [2] at ./pm_fixed_width_file.
+pl line 32.
[download]

"I may be forced to nick the above, thanks very much! Rest assured I shall put a "#Ken" in my code ;-)"

Help yourself to the code. Attribution is courteous but not required. A link to the node where you got the code may be useful for subsequent maintainers and could possibly save you having to redocument what's already been written here (e.g. rationale for changes you implement).

-- Ken

[reply]
[d/l]
[select]

Re^4: Google has failed me! Using pack and substr for fixed width file output

by runrig (Abbot) on Apr 10, 2014 at 16:02 UTC

Re^5: Google has failed me! Using pack and substr for fixed width file output

by kcott (Archbishop) on Apr 10, 2014 at 21:34 UTC

Re: Google has failed me! Using pack and substr for fixed width file output
by runrig (Abbot) on Apr 08, 2014 at 21:08 UTC

Parse::FixedLength

use Parse::FixedLength;

my $pfl = Parse::FixedLength->new([qw(
  name:10
  date:6
  filler:10
)]);

my %hash = (
  name => 'Joe',
  date => '020414',
);

my $str = $pfl->pack(\%hash);
print "[$str]\n";

$hash{name} = 'Bob';
$str = $pfl->pack(\%hash);
print "[$str]\n";

# Or even:
use Parse::FixedLength;

my $pfl = Parse::FixedLength->new([qw(
  name:10
  date:6
  filler:10
)], {href => \my %hash} );

%hash = (
  name => 'Joe',
  date => '020414',
);

my $str = $pfl->pack();
print "[$str]\n";

$hash{name} = 'Bob';
$str = $pfl->pack();
print "[$str]\n";
[download]

[reply]
[d/l]