in reply to Google has failed me! Using pack and substr for fixed width file output

G'day Jake,

Welcome to the monastery.

I see ++Athanasius has provided you with an answer to your question.

Whenever you see almost identical lines of code being repeated, such as you have here, consider abstracting that into a subroutine.

Apart from the physical effort of typing in all that code, you're having to do calculations at every step to determine each substring's offset.

Your computer can perform calculations faster and more accurately than you can: tell it to do this work. :-)

Any errors you make are likely to propogate to each successive line: a simple off-by-one error due to calculating the first character as being at postion 1 instead of position 0 would mean having to recalculate all but the first of your substr statements. Actually, it looks like that's exactly what you've done! [substr($line, 0, 135) addresses positions 0 to 134: what happened to substr($line, 135, 1)?]

Beyond the fact that hand-coding all of this is laborious, error-prone and may take several attempts to get it right, think of the maintenance issues. You have 6 characters assigned for your dates (11 chars for "DATE=${mdy}" and 6 chars for "${mdy}"): that looks like a MMDDYY format. What if you decided to make this more robust and avoid the so-called Y2K Bug by using a MMDDYYYY format: more calculations, more code modifications, and more potential errors.

The only information you should really need to concern yourself with is the size of the fields and the data to go into those fields. Let the computer worry about substring offsets, data size and whether padding is required.

Here's a simplified example of how you might go about this. Consider adding validation code, e.g. to check that you're not trying to put data into a field that's too small to hold it.

#!/usr/bin/env perl -l use strict; use warnings; my ($mdy, $hms, $CLIENT_ID) = qw{MMDDYY HHMMSS 123}; my @fields = ( [135, ''], [11, "DATE=$mdy"], [6, $mdy], [6, $hms], [3, ''], [17, ''], [8, 'G1ADP'], [3, $CLIENT_ID], ); my $line = pack 'A510'; my $offset = 0; add_field(\$line, \$offset, @$_) for @fields; print 'Length of line = ', length $line; print "Line = |$line|"; sub add_field { my ($line_ref, $offset_ref, $length, $data) = @_; substr($$line_ref, $$offset_ref, $length) = $data . ' ' x ($length - length $data); $$offset_ref += $length; }

Output:

Length of line = 510 Line = | + + DATE=MMDDYYMMDDYYHHMMSS G1ADP 123 + + + + + |

-- Ken

Replies are listed 'Best First'.
Re^2: Google has failed me! Using pack and substr for fixed width file output
by spudulike (Novice) on Apr 09, 2014 at 05:55 UTC

    Hi Ken,

    That's a really elegant solution to something I do almost every day especially as I was about to start coding up 119 fields for the data rows :-)

    I may be forced to nick the above, thanks very much! Rest assured I shall put a "#Ken" in my code ;-)

    Cheers, Jake

      "That's a really elegant solution to something I do almost every day especially as I was about to start coding up 119 fields for the data rows :-)"

      It sounds like you really need to take that abstraction one step further and make a module for all your scripts to use. You will already have saved yourself a lot of work by reducing all those hand-crafted substr statements to a single add_field() subroutine. What you want to avoid is copying that function into every new script you write.

      Consider a situation where you find a problem with add_field() or you want to extend its functionality. If you've just pasted copies of add_field() into multiple scripts, you'll need to fix or modify every one of those; if you've used a module, you'll only need to make changes in one place.

      Here's an example of how the code I posted earlier could be put into a module:

      package PM::FixedWidthFile; use strict; use warnings; use autodie; use Exporter qw{import}; our @EXPORT_OK = qw{populate_file}; use Carp; sub populate_file { my ($fh, $record_length, $field_data) = @_; for my $fields (@$field_data) { my $line = pack 'A' . $record_length; my $offset = 0; _add_field(\$line, \$offset, @$_) for @$fields; print {$fh} $line, "\n"; } return; } sub _add_field { my ($line_ref, $offset_ref, $length, $data, $r_align) = @_; if ($length < length $data) { croak "Data [$data] too large for field of length [$length]"; } my @dat_pad = ($data, ' ' x ($length - length $data)); substr($$line_ref, $$offset_ref, $length) = join '' => @dat_pad[$r_align ? (1, 0) : (0, 1)]; $$offset_ref += $length; return; } 1; =head1 NAME PM::FixedWidthFile - TODO (for Jake): module documentation in POD form +at

      add_field() is now the (pseudo-)private routine _add_field(). I've added an optional, boolean argument ($r_align) to right-align field data. There's also some validation code.

      populate_file() is the public API. It creates a line of the desired length ($record_length) and calls _add_field() to populate the lines with the data from $field_data and outputs the lines to $fh (without having to know anything about what file is involved or whether it's writing to a new file or appending to an existing one).

      In most cases, your scripts will need little more than:

      use PM::FixedWidthFile qw{populate_file}; ... my $record_length = ...; my $outfile = ...; my $file_data = ...; open my $fh, '>', $outfile; populate_file($fh, $record_length, $file_data);

      Here's an actual example with dummy test data:

      #!/usr/bin/env perl use strict; use warnings; use autodie; use PM::FixedWidthFile qw{populate_file}; my $fixed_width_file_base = './pm_fixed_width_file.out_'; my $record_length = 32; # not including line terminator my @multi_file_data = ( [ [ [10, ''], [10, 123], [10, 456] ], ], [ [ [10, ''], [10, 123], [10, 456] ], [ [10, ''], [10, 123, 1], [10, 456] ], [ [10, ''], [10, 123], [10, 456], [2, 78] ], ], [ [ [10, ''], [10, 123], [10, 456], [2, 78] ], [ [10, ''], [10, 123], [10, 456], [2, 789] ], ], ); for my $i (0 .. $#multi_file_data) { my $outfile = $fixed_width_file_base . $i; print "Populating: $outfile\n"; open my $fh, '>', $outfile; populate_file($fh, $record_length, $multi_file_data[$i]); close $fh; system qw{cat -vet}, $outfile; unlink $outfile; # my housekeeping }

      [In case you didn't know, cat -vet filename prints filename and shows various symbols to represent characters that you can't normally see or may have display problems (e.g. whitespace and characters outside the 7-bit ASCII range). The only symbol of interest here is the $ sign which represents a newline. See the cat manpage for more information.]

      Output:

      Populating: ./pm_fixed_width_file.out_0 123 456 $ Populating: ./pm_fixed_width_file.out_1 123 456 $ 123456 $ 123 456 78$ Populating: ./pm_fixed_width_file.out_2 Data [789] too large for field of length [2] at ./pm_fixed_width_file. +pl line 32.

      "I may be forced to nick the above, thanks very much! Rest assured I shall put a "#Ken" in my code ;-)"

      Help yourself to the code. Attribution is courteous but not required. A link to the node where you got the code may be useful for subsequent maintainers and could possibly save you having to redocument what's already been written here (e.g. rationale for changes you implement).

      -- Ken

        It sounds like you really need to take that abstraction one step further and make a module for all your scripts to use.

        Gee, I wonder if such a library might already exist...