spudulike has asked for the wisdom of the Perl Monks concerning the following question:

Hi All

I was wondering if someone can enlighten me on the following behaviour with regard to a particular code snippet

Code is:

<snip>

$line = pack ("A1170") ; substr ($line, 0, 135) = " " ; substr ($line, 136, 11) = "DATE=${mdy}" ; substr ($line, 147, 6) = "${mdy}" ; substr ($line, 153, 6) = "${hms}" ; substr ($line, 159, 3) = " " ; substr ($line, 162, 17) = " " ; substr ($line, 179, 8) = "G1ADP" ; substr ($line, 187, 3) = "$CLIENT_ID" ; substr ($line, 190, 507) = " " ; substr ($line, 508, 1) = " " ;

<snip>

Basically I deal with outputting various data in fixed width format and like the above method as it references back quite nicely to file definitions that I follow to create the output. For instance, the above is the definition for a file header

But, I've been ignoring a couple of interesting "funnies" with this way of outputting fixed widths. These being:
1) My $line should be 510 spaces long. But if I try to set A510 I get the old substr outside of string at error and for some reason I have to set it to the value of 1170 (found by trial and error).
2) If the last field is space, then I have to add in an extra output to get the file length outputted correctly. As in the above substr ($line, 508, 1) = " " ;.
Can anyone enlighten me on what's occurring? I expect it's user error :-)

Cheers, Jake (aka spudulike)

Replies are listed 'Best First'.
Re: Google has failed me! Using pack and substr for fixed width file output
by Athanasius (Archbishop) on Apr 08, 2014 at 11:57 UTC

    Hello spudulike, and welcome to the Monastery!

    Your first line, $line = pack ("A1170") ;, fills $line with a string of 1170 spaces. The next line, substr ($line, 0, 135) = " " ;, replaces the first 135 characters of $line with a single space, thereby shortening the string by 134 characters. Similarly, the next-to-last line shortens the string by a further 506 characters! What are those two lines intended to do?

    I think you need to study the documentation for the substr function. :-)

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      Hi and it's great to be here,

      That's perfect thanks and explains exactly what I'm seeing!

      I spent most of the morning trawling the web trying to figure it out and now you've pointed it out, it's blindingly obvious. Head slapping moment :-)

      Cheers, Jake (aka spudulike)

Re: Google has failed me! Using pack and substr for fixed width file output
by roboticus (Chancellor) on Apr 08, 2014 at 14:39 UTC

    spudulike:

    Following your pattern, substr($line,190,507) looks like it should be substr($line,190,317).

    For what it's worth, I deal with flat files quite a bit. Generally, though, I use pack and unpack instead of substr. That way, I can use the same format string for both packing and unpacking, kinda like this:

    my $packfmt = "A135A11A6A6A20A8A3A978"; my $outline = pack $packfmt, " ", "DATE=${mdy}", ${mdy}, ${hms}, " ", "G1ADP", $CLIENT_ID, " "; # Unpack, trailing blanks preserved (undef, $DMDY, $mdy, $hms, undef, $CLIENT_ID) = unpack $packfmt, $outline; # Unpack 1: remove trailing blanks (same format, but # using map to trim the strings): (undef, $DMDY, $mdy, $hms, undef, $CLIENT_ID) = map { s/\s+$//; $_ } unpack $packfmt, $outline; # Unpack 2: remove trailing blanks, different format my $parsefmt = lc($packfmt); (undef, $DMDY, $mdy, $hms, undef, $CLIENT_ID) = unpack $parsefmt, $outline;

    Note: I don't have perl on my work machine, so this is (a) from memory, and (b) quite possibly a bit broken.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

      Hi Roboticus,

      It's good to know that there are other fixed width hackers out there :-)

      I did use pack for a while and I did find it easier and faster to create a solution using it. But I had a problem explaining how to match up the fields with the pack format to the supporting teams. They couldn't seem to get their heads around it. So, for me, I got called less if I developed the solution using substr coz (I suppose) it was easier for the support bods to conceptualise the approach.

      Thanks though, it's nice to know I'm <drum roll please> part of the <ahem> pack. Thank you very much, I'm here til Thursday - try the fish.

      Cheers, Jake

        There are all kinds of interesting games you can play with pack, and it should be easy to encapsulate your solution in a function (with lotsa validation) so that maintainers only see data. See below.

        Note that the  @ pack template specifier fills with nulls, so if you want space-filling, you need the
            $packed =~ tr{\000}{ };
        statement. Note also that  @ is absolute, so maybe play around and see, e.g., what happens if the  date offset is set to 1 or 2.

        c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le "use constant ORDER => qw(date client_id info); ;; my %o_l = ( date => { qw(off 0 len 6 field 040814) }, client_id => { qw(off 6 len 8 field WayTooMuchClientID) }, info => { qw(off 20 len 7 field G1APD) }, ); ;; my $fields = join ' ', map qq{\@$o_l{$_}{off} A$o_l{$_}{len}}, map { exists $o_l{$_} or die qq{bad '$_'}; $_; } ORDER ; ;; my $packed = pack $fields, map $o_l{$_}{field}, ORDER; $packed =~ tr{\000}{ }; print 'packed len ', length $packed, qq{ '$packed'}; ;; my $total = 30; die 'truncation!' if length($packed) > $total; my $line = pack qq{A$total}, $packed; print 'total len ', length($line), qq{ '$line'}; ;; dd \$line; " packed len 27 '040814WayTooMu G1APD ' total len 30 '040814WayTooMu G1APD ' \"040814WayTooMu G1APD "
Re: Using pack and substr for fixed width file output ( substr outside of string )
by Anonymous Monk on Apr 08, 2014 at 11:24 UTC
    If you're dealing with a fixed with string, pad it to the desired length, then you won't be outside it
    $ perl -Mdiagnostics -e " $foo = sprintf q{%-24s}, 123; substr $foo, 2 +0, 4 , 4444; print $foo; " 123 4444
Re: Google has failed me! Using pack and substr for fixed width file output
by kcott (Archbishop) on Apr 09, 2014 at 00:37 UTC

    G'day Jake,

    Welcome to the monastery.

    I see ++Athanasius has provided you with an answer to your question.

    Whenever you see almost identical lines of code being repeated, such as you have here, consider abstracting that into a subroutine.

    Apart from the physical effort of typing in all that code, you're having to do calculations at every step to determine each substring's offset.

    Your computer can perform calculations faster and more accurately than you can: tell it to do this work. :-)

    Any errors you make are likely to propogate to each successive line: a simple off-by-one error due to calculating the first character as being at postion 1 instead of position 0 would mean having to recalculate all but the first of your substr statements. Actually, it looks like that's exactly what you've done! [substr($line, 0, 135) addresses positions 0 to 134: what happened to substr($line, 135, 1)?]

    Beyond the fact that hand-coding all of this is laborious, error-prone and may take several attempts to get it right, think of the maintenance issues. You have 6 characters assigned for your dates (11 chars for "DATE=${mdy}" and 6 chars for "${mdy}"): that looks like a MMDDYY format. What if you decided to make this more robust and avoid the so-called Y2K Bug by using a MMDDYYYY format: more calculations, more code modifications, and more potential errors.

    The only information you should really need to concern yourself with is the size of the fields and the data to go into those fields. Let the computer worry about substring offsets, data size and whether padding is required.

    Here's a simplified example of how you might go about this. Consider adding validation code, e.g. to check that you're not trying to put data into a field that's too small to hold it.

    #!/usr/bin/env perl -l use strict; use warnings; my ($mdy, $hms, $CLIENT_ID) = qw{MMDDYY HHMMSS 123}; my @fields = ( [135, ''], [11, "DATE=$mdy"], [6, $mdy], [6, $hms], [3, ''], [17, ''], [8, 'G1ADP'], [3, $CLIENT_ID], ); my $line = pack 'A510'; my $offset = 0; add_field(\$line, \$offset, @$_) for @fields; print 'Length of line = ', length $line; print "Line = |$line|"; sub add_field { my ($line_ref, $offset_ref, $length, $data) = @_; substr($$line_ref, $$offset_ref, $length) = $data . ' ' x ($length - length $data); $$offset_ref += $length; }

    Output:

    Length of line = 510 Line = | + + DATE=MMDDYYMMDDYYHHMMSS G1ADP 123 + + + + + |

    -- Ken

      Hi Ken,

      That's a really elegant solution to something I do almost every day especially as I was about to start coding up 119 fields for the data rows :-)

      I may be forced to nick the above, thanks very much! Rest assured I shall put a "#Ken" in my code ;-)

      Cheers, Jake

        "That's a really elegant solution to something I do almost every day especially as I was about to start coding up 119 fields for the data rows :-)"

        It sounds like you really need to take that abstraction one step further and make a module for all your scripts to use. You will already have saved yourself a lot of work by reducing all those hand-crafted substr statements to a single add_field() subroutine. What you want to avoid is copying that function into every new script you write.

        Consider a situation where you find a problem with add_field() or you want to extend its functionality. If you've just pasted copies of add_field() into multiple scripts, you'll need to fix or modify every one of those; if you've used a module, you'll only need to make changes in one place.

        Here's an example of how the code I posted earlier could be put into a module:

        package PM::FixedWidthFile; use strict; use warnings; use autodie; use Exporter qw{import}; our @EXPORT_OK = qw{populate_file}; use Carp; sub populate_file { my ($fh, $record_length, $field_data) = @_; for my $fields (@$field_data) { my $line = pack 'A' . $record_length; my $offset = 0; _add_field(\$line, \$offset, @$_) for @$fields; print {$fh} $line, "\n"; } return; } sub _add_field { my ($line_ref, $offset_ref, $length, $data, $r_align) = @_; if ($length < length $data) { croak "Data [$data] too large for field of length [$length]"; } my @dat_pad = ($data, ' ' x ($length - length $data)); substr($$line_ref, $$offset_ref, $length) = join '' => @dat_pad[$r_align ? (1, 0) : (0, 1)]; $$offset_ref += $length; return; } 1; =head1 NAME PM::FixedWidthFile - TODO (for Jake): module documentation in POD form +at

        add_field() is now the (pseudo-)private routine _add_field(). I've added an optional, boolean argument ($r_align) to right-align field data. There's also some validation code.

        populate_file() is the public API. It creates a line of the desired length ($record_length) and calls _add_field() to populate the lines with the data from $field_data and outputs the lines to $fh (without having to know anything about what file is involved or whether it's writing to a new file or appending to an existing one).

        In most cases, your scripts will need little more than:

        use PM::FixedWidthFile qw{populate_file}; ... my $record_length = ...; my $outfile = ...; my $file_data = ...; open my $fh, '>', $outfile; populate_file($fh, $record_length, $file_data);

        Here's an actual example with dummy test data:

        #!/usr/bin/env perl use strict; use warnings; use autodie; use PM::FixedWidthFile qw{populate_file}; my $fixed_width_file_base = './pm_fixed_width_file.out_'; my $record_length = 32; # not including line terminator my @multi_file_data = ( [ [ [10, ''], [10, 123], [10, 456] ], ], [ [ [10, ''], [10, 123], [10, 456] ], [ [10, ''], [10, 123, 1], [10, 456] ], [ [10, ''], [10, 123], [10, 456], [2, 78] ], ], [ [ [10, ''], [10, 123], [10, 456], [2, 78] ], [ [10, ''], [10, 123], [10, 456], [2, 789] ], ], ); for my $i (0 .. $#multi_file_data) { my $outfile = $fixed_width_file_base . $i; print "Populating: $outfile\n"; open my $fh, '>', $outfile; populate_file($fh, $record_length, $multi_file_data[$i]); close $fh; system qw{cat -vet}, $outfile; unlink $outfile; # my housekeeping }

        [In case you didn't know, cat -vet filename prints filename and shows various symbols to represent characters that you can't normally see or may have display problems (e.g. whitespace and characters outside the 7-bit ASCII range). The only symbol of interest here is the $ sign which represents a newline. See the cat manpage for more information.]

        Output:

        Populating: ./pm_fixed_width_file.out_0 123 456 $ Populating: ./pm_fixed_width_file.out_1 123 456 $ 123456 $ 123 456 78$ Populating: ./pm_fixed_width_file.out_2 Data [789] too large for field of length [2] at ./pm_fixed_width_file. +pl line 32.

        "I may be forced to nick the above, thanks very much! Rest assured I shall put a "#Ken" in my code ;-)"

        Help yourself to the code. Attribution is courteous but not required. A link to the node where you got the code may be useful for subsequent maintainers and could possibly save you having to redocument what's already been written here (e.g. rationale for changes you implement).

        -- Ken

Re: Google has failed me! Using pack and substr for fixed width file output
by runrig (Abbot) on Apr 08, 2014 at 21:08 UTC
    What I might do (but using your actual list of fields and helpful/descriptive names for each field) using Parse::FixedLength:
    use Parse::FixedLength; my $pfl = Parse::FixedLength->new([qw( name:10 date:6 filler:10 )]); my %hash = ( name => 'Joe', date => '020414', ); my $str = $pfl->pack(\%hash); print "[$str]\n"; $hash{name} = 'Bob'; $str = $pfl->pack(\%hash); print "[$str]\n"; # Or even: use Parse::FixedLength; my $pfl = Parse::FixedLength->new([qw( name:10 date:6 filler:10 )], {href => \my %hash} ); %hash = ( name => 'Joe', date => '020414', ); my $str = $pfl->pack(); print "[$str]\n"; $hash{name} = 'Bob'; $str = $pfl->pack(); print "[$str]\n";