svenXY has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I have experienced some strange behaviour with Tie::File.
The following code ties to the file, adds some elements, then sorts them and unties. After that, a newline is added to the file. Those newlines add up and clutter the file.

#!/usr/bin/perl use strict; use warnings; use Tie::File; tie my @tied_array, 'Tie::File', 'tiefile' or die "Could not tie to t +iefile: $!"; push(@tied_array, $_) for (1..3); print "\@tied_array has " . scalar @tied_array . " elements before sor +ting\n"; @tied_array = sort {uc($a) cmp uc($b)} @tied_array; print "\@tied_array has " . scalar @tied_array . " elements after sort +ing\n"; untie @tied_array;
prints:
~/dev/perl$ perl test_tie_file.pl @tied_array has 3 elements before sorting @tied_array has 4 elements after sorting ~/dev/perl$ perl test_tie_file.pl @tied_array has 7 elements before sorting @tied_array has 8 elements after sorting ~/dev/perl$ perl test_tie_file.pl @tied_array has 11 elements before sorting @tied_array has 12 elements after sorting ~/dev/perl$ cat tiefile 1 1 1 2 2 2 3 3 3 ~/dev/perl$

Sure, I could get rid of those lines easily*, but I find it strange that they are added in the first place. Can anyone shed some light on this?

Regards,
svenXY

* Update: @existing_packages = grep {!/^$/} @existing_packages; does the trick

Replies are listed 'Best First'.
Re: Tie::File - sorting array adds empty lines
by wojtyk (Friar) on Sep 11, 2008 at 22:44 UTC
    I actually spent a few hours looking into it out of curiosity. It's a very unique bug. Like ikegami said, it only occurs in the optimized case of a tied array where the sort is of this format: @a = sort @a.

    From what I can tell of gdb stumblings through Perl_pp_sort(), in the particular event of a tied array in the above format, the code branch at line 1716 of pp_sort.c will be followed (code below):

    if (av && !sorting_av) { /* simulate pp aassign of tied AV */ ... av_extend(av, max); ... }

    When the av_extend is called, max has the correct value that was returned from FETCHSIZE. However, the code that deals with tied arrays at the top of av_extend ends up pushing max+1 onto the stack prior to the EXTEND call:

    Perl_av_extend(pTHX_AV *av, I32 key) { MAGIC * const mg = SvTIED_mg((SV*)av, PERL_MAGIC_tied); if (msg) { ... PUSHs(SvTIED_obj((SV*)av, mg)); PUSHs(sv_2mortal(newSViv(key+1))); PUTBACK; call_method("EXTEND", G_SCALAR|G_DISCARD);

    I haven't the foggiest why this is, as I'm no Perl internals expert. But the result appears to be an off-by-one in the module's implementation of the EXTEND.

    I think the reason it doesn't affect many other modules is that the bulk of modules that use tied arrays (that I've tested at least) have EXTEND as a no-op function ({}). Tie::File, on the other hand, actually uses the EXTEND to determine the number of records in the file. Because of this, you always end up with an extra empty record (which in this case is a newline, since that is the default record separator) because of the off-by-one.

      Thanks. Submitting bug report for Tie::File. (Upd: CPAN RT bug #39196 )

      But the result appears to be an off-by-one in the module's implementation of the EXTEND.

      It's not an off-by-one error, at least not on the module's behalf.
      EXTEND is used to expand the internal buffer.
      STORESIZE is used to actually change the visible size of the array.
      Tie::File incorrectly treats EXTEND as STORESIZE.

        I concur. There has been a long standing bug that the tied EXTEND method was being called with 1 more than it should have been called by the perl internals sub av_extend(). This was complemented by another bug where in pp_aassign() av_extend was being called with 1 less than it should have been, cancelling each other out in a practical sense for most use cases. Since most Tie modules implement EXTEND as a NO-OP this was not noticed. Once the two fencepost errors were removed this problem in Tie::File went away. I have pushed a fix which is currently being smoke tested, assuming that fix was correct I will merge it to blead, and we might see the fix included in Perl 5.32. I would like to apologize on behalf of the perl5porters community for not getting to the bottom of this earlier.

        See https://github.com/Perl/perl5/issues/17496 for details.

        ---
        $world=~s/war/peace/g

Re: Tie::File - sorting array adds empty lines
by ikegami (Patriarch) on Sep 11, 2008 at 20:09 UTC

    I can replicate your results
    with Perl 5.8.8 and Tie::File 0.97 and
    with Perl 5.10.0 and Tie::File 0.97_02.

    sort is optimized to sort in place when the source and destination are the same.

    >perl -MO=Concise -e"@a = sort @a" 2>&1 | find "sort" 7 <@> sort lK/INPLACE ->8 >perl -MO=Concise -e"@b = sort @a" 2>&1 | find "sort" 7 <@> sort lK ->8

    I don't know if it's a bug in Tie::File when dealing with sort's optimization or a bug in sort's optimization when dealing with tied arrays, but the bug can be avoided by avoiding the optimization:

    @tied_array = map $_, sort { uc($a) cmp uc($b) } @tied_array;

    Update: Better yet,

    @tied_array = ((), sort {uc($a) cmp uc($b)} @tied_array);
Re: Tie::File - sorting array adds empty lines (followup)
by ikegami (Patriarch) on Sep 11, 2008 at 20:29 UTC

    For what it's worth, I can't replicate it with a different type of tied array.

    use strict; use warnings; use Tie::File qw( ); use Tie::Array qw( ); for my $module ('Tie::File', 'Tie::StdArray') { print("$module:\n"); tie my @array, $module, 'tiefile'; push @array, "[$_]" for 0..2; print(scalar(@array), "\n"); @array = sort @array; print(scalar(@array), "\n"); print("\n"); }
    Tie::File: 3 4 Tie::StdArray: 3 3