arsoncupid has asked for the wisdom of the Perl Monks concerning the following question:

Hilo,

I'm not sure what's going on here, for some reason my array keeps growing as I look at each entry in the array.

This script looks at one .TAB file, compares each line's second tab stop text (probably very inefficiently) to all the other line's second tab stop text. I'm looking for duplicates.

This is on Mac panther, BTW. ... perl5 (revision 5.0 version 8 subversion 1 RC3)

#!/usr/bin/perl $| = 1; open ( ADDS, "./additions.tab" ); while ( $line = <ADDS> ) { chomp ( $line ); @div = split ( /\t/, $line ); $blip = 0; # increment integer, keeps track of tab stops foreach my $atom (@div) { @{$lines[$lino]}->[$blip++] = $atom; } $lino++; # increment integer, keeps track of lines } close ADDS; foreach my $Line (@lines) { $dinky = 0; # toggle integer, for existance of duplicates $startat++; # duel use. # 1 Keeps track of already inspected lines, to avoid redoing that work # 2 Let's me see output by printing every few lines foreach my $tick ($startat..( scalar ( @lines ) )) { if ( $lines[$tick]->[1] eq $Line->[1] ) { print (join ( "\t", ( @{ $lines[$tick] } ) ) +) . "\n"; $dinky++; } } print (join ( "\t", ( @{$Line} ) )) . "\n" if ( $dinky > 0 +); print STDERR $startat . "\n" if ( !( $startat % 100 ) ); print STDERR "(" . scalar ( @lines ) . ")\n" if ( !( $startat +% 100) ); }

Sample Output:

100 (2506) 200 (2606) 300 (2706) 400 (2806) 500 (2906)

Replies are listed 'Best First'.
Re: Oddly growing array
by revdiablo (Prior) on Oct 05, 2005 at 18:04 UTC

    To expand on Roy Johnson's comment, you are iterating one past the end of your array each time, and autovivifying the new elements. Here's a quick demonstration:

    my @array = ( [ 1, 2, 3 ], [ 4, 5, 6 ], ); print scalar(@array), "\n"; # 2 print $array[2][0], "\n"; # undef print scalar(@array), "\n"; # 3

    You can see the new element is automatically added by just accessing a subelement.

Re: Oddly growing array
by Roy Johnson (Monsignor) on Oct 05, 2005 at 17:55 UTC
    scalar(@lines) is one beyond the last entry in @lines. Use $#lines instead.

    Caution: Contents may have been coded under pressure.
Re: Oddly growing array
by sk (Curate) on Oct 05, 2005 at 16:52 UTC
    This script looks at one .TAB file, compares each line's second tab stop text (probably very inefficiently) to all the other line's second tab stop text. I'm looking for duplicates.

    haven't had a chance to read the code but wouldn't something like this do what you want?

    Untested

    #!/usr/bin/perl use strict; use warnings; my %dups; while (<DATA>) { my $col = (split /\t/)[1]; # get the second col $dups{$col}++; # put the text into a hash and increment counter. } print +($_ , " : ", $dups{$_},$/) for (keys %dups); # print the counts
Re: Oddly growing array
by Tanktalus (Canon) on Oct 05, 2005 at 18:19 UTC

    Let me guess ... you've programmed in other languages before. ;-)

    First - $lino. Looks like a duplicate of $. (see perlvar).

    The @{} isn't needed in your $atom assignment: $lines[$.][$blip++] = $atom works fine. Although $lines[$.] = \@div works even better than using the foreach at all.

    As pointed out by sk, you really want to use a hash to detect duplicates. Personally, I like as much info as possible, so I'd do something like this:

    my %whole_file; while (<ADDS>) { my %info = ( line_number => $., line_text => $_, atoms => split /\t/ ); $info{key} = $info{atoms}[1]; push @{$whole_file{$info{key}}}, \%info; # you can print out a tick every 100 or whatever here, using $. } my @dupes = grep { scalar @$_ > 1 } values %whole_file; # to see what data structure I just built, dump it via your favourite +dumper. Mine is: use Data::Dumper; print Dumper(\@dupes);
    With all that info, I can do whatever I want no matter how flexible my requirements need to be, no matter what changes my boss sees fit to throw at me.

Re: Oddly growing array
by InfiniteSilence (Curate) on Oct 05, 2005 at 18:03 UTC
    Some points:
  • If you use strict you will notice interesting things about the code, namely
    Using an array as a reference is deprecated at lookingat.pl line 12.
    Which is
    ==12== @{$lines[$lino]}->[$blip++] = $atom;
  • You might want to use Data::Dumper in your original code to take a look at the array or if that seems like too much work just look at the last n items:
    print STDERR @lines[-5..-1]
    But I would take sk's advice and just REWRITE the entire thing.

    Celebrate Intellectual Diversity

Re: Oddly growing array
by arsoncupid (Initiate) on Oct 05, 2005 at 18:43 UTC
    wow, you all got back to me quick! Thanks for the help, every comment was useful and I learned a few things in addition to getting the fix I needed.

    notes:

    contents were quickly coded under sleep deprivation, but the work here is nearly pressureless :)

    I haven't programmed in perl since ... maybe 1999? Since then I have to admit I've fallen in love with C#. But C# is useless in an Apple environment

    certainly a rewrite makes sense, but why reinvent the wheel? I just piecemealed what was here. thank you everyone! This is an active community :)