oko1 has asked for the wisdom of the Perl Monks concerning the following question:

Given a file that looks like this:

[ -1, 5, 1 ], [ 0, 5, 1 ], [ 0, 5, 1 ], [ 1, 5, 1 ], [ 3, 4, 1 ], [ 5, 1, 1 ], [ 5, 0, 1 ], [ 5, 0, 1 ], [ 5, 0, 1 ], [ 5, 0, 1 ], [ 5, 0, 1 ], [ 5, 0, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ -23, -64, 0 ], [ -5, 0, 1 ], [ -5, 1, 1 ],

I'm trying to collapse repeated lines that have a 0 in the 1st or the 2nd position into a single line that sums up the non-zero element while ignoring the 3rd one - i.e., 7 repeats of [ 0, -5, 1 ] should be replaced with [ 0, -35, 1 ]. All other lines should just be printed out. Unfortunately, my brain has decided to take a vacation today, and isn't being helpful - so I'm hoping for some Wisdom from the Monastery. :)

-- 
I hate storms, but calms undermine my spirits.
 -- Bernard Moitessier, "The Long Way"

Replies are listed 'Best First'.
Re: Summing up duplicate lines
by choroba (Cardinal) on May 08, 2024 at 19:56 UTC
    The specification is still vague. What should the third value be if it's not the same, e.g. [0, 5, 1] and [0, 5, 2]? Also, in what order should the triplets be printed?

    The following code uses the order of the first appearance of the first two columns, and uses the third value from the first occurrence.

    Update: The %seen hash contains the index into the @out array, i.e. the index of the first occurrence of the two values.

    #!/usr/bin/perl use strict; use feature qw{ say }; use warnings; my %seen; my @out; while (<DATA>) { my @n = /-?[0-9]+/g; if ($n[0] == 0 || $n[1] == 0) { if (exists $seen{"@n[0, 1]"}) { $out[ $seen{"@n[0, 1]"} ][$_] += $n[$_] for 0, 1; } else { push @out, \@n; $seen{"@n[0, 1]"} = $#out; } } else { push @out, \@n; } } say "[@$_]" for @out; __DATA__ [ -1, 5, 1 ], [ 0, 5, 1 ], [ 0, 5, 1 ], [ 1, 5, 1 ], [ 3, 4, 1 ], [ 5, 1, 1 ], [ 5, 0, 1 ], [ 5, 0, 1 ], [ 5, 0, 1 ], [ 5, 0, 1 ], [ 5, 0, 1 ], [ 5, 0, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ -23, -64, 0 ], [ -5, 0, 1 ], [ -5, 1, 1 ],

    Output:

    [-1 5 1] [0 10 1] [1 5 1] [3 4 1] [5 1 1] [30 0 1] [0 -45 1] [-23 -64 0] [-5 0 1] [-5 1 1]

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

      You're right - I didn't even think to specify this. There's no case in which the third value can be different among repeats, so it's not an issue. As to the order of the triplets, it would be the original one except for the "collapsed" ones.

      Your solution looks great - thank you, much appreciated!

      -- 
      I hate storms, but calms undermine my spirits.
       -- Bernard Moitessier, "The Long Way"
Re: Summing up duplicate lines
by hv (Prior) on May 08, 2024 at 20:21 UTC

    The first rule for any problem like this is: define a signature such that elements which should be seen as the same have the same signature, then use a hash with signatures as keys to group things.

    I think by "repeated lines" you mean lines in which all three values are the same, so the signature should include all three values:

    sub signature { my $arrayref = shift; # the examples imply that the inputs are integers # if not, use a different character to join them return join '.', @$arrayref; }

    Once you have a signature, you can use a hash to accumulate any duplicates; in this case we just need a count:

    my %seen; for my $arrayref (@all_inputs) { if ($arrayref->[0] && $arrayref->[1]) { # not interesting, just pass it through print_result($arrayref); } else { # get a signature, increase the count for that signature by 1 ++$seen{ signature($arrayref) }; } } # now construct combined results for the saved inputs for my $sig (keys %seen) { my $count = $seen{$sig}; my $arrayref = [ split /\./, $sig ]; # keep it easy: multiplying the zero value is a noop $arrayref->[$_] *= $count for (0, 1); print_result($arrayref); }

    For your example data, this ends up with combined lines [ 0, 10, 1 ], [ -5, 0, 1 ], [ 30, 0, 1 ], [ 0, -45, 1 ].

      I'm afraid that didn't work - your output (once I created a 'sub print_result', that is :) ) looked like this:

      [ -1, 5, 1 ], [ 1, 5, 1 ], [ 3, 4, 1 ], [ 5, 1, 1 ], [ -23, -64, 0 ], [ -5, 1, 1 ], [ 30, 0, 1 ], [ -5, 0, 1 ], [ 0, -45, 1 ], [ 0, 10, 1 ]
      which skipped some lines. For comparison, choroba's was
      [-1 5 1] [0 10 1] [1 5 1] [3 4 1] [5 1 1] [30 0 1] [0 -45 1] [-23 -64 0] [-5 0 1] [-5 1 1]

      which was correct.

      -- 
      I hate storms, but calms undermine my spirits.
       -- Bernard Moitessier, "The Long Way"

        Actually, in terms of output content both are identical. The difference is that the two solutions give a different line order.

        Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
Re: Summing up duplicate lines
by tybalt89 (Monsignor) on May 08, 2024 at 22:56 UTC

    TIMTOWTDI

    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11159342 use warnings; use List::Util qw( sum ); local $_ = join '', <DATA>; s/(?:\[ 0, \S+(,.*\n)){2,}/ my $last = $1; '[ 0, ' . (sum $& =~ m~, (\S+),~g) . $last/ge; s/(?:\[ .*(, 0, .*\n)){2,}/ my $last = $1; '[ ' . (sum $& =~ m~\[ (\S+),~g) . $last/ge; print; __DATA__ [ -1, 5, 1 ], [ 0, 5, 1 ], [ 0, 5, 1 ], [ 1, 5, 1 ], [ 3, 4, 1 ], [ 5, 1, 1 ], [ 5, 0, 1 ], [ 5, 0, 1 ], [ 5, 0, 1 ], [ 5, 0, 1 ], [ 5, 0, 1 ], [ 5, 0, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ -23, -64, 0 ], [ -5, 0, 1 ], [ -5, 1, 1 ],

    which outputs:

    [ -1, 5, 1 ], [ 0, 10, 1 ], [ 1, 5, 1 ], [ 3, 4, 1 ], [ 5, 1, 1 ], [ 30, 0, 1 ], [ 0, -45, 1 ], [ -23, -64, 0 ], [ -5, 0, 1 ], [ -5, 1, 1 ],
Re: Summing up duplicate lines
by tybalt89 (Monsignor) on May 10, 2024 at 15:13 UTC

    And more TIMTOWTDI. It's only three lines of code if you don't count the 'use's.

    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11159342 use warnings; my @collapsed = eval <DATA>; $_->[0] == 0 && $collapsed[-1][0] == 0 ? ($collapsed[-1][1] += $_->[1] +) : $_->[1] == 0 && $collapsed[-1][1] == 0 ? ($collapsed[-1][0] += $_- +>[0]) : push @collapsed, $_ for map eval, <DATA>; printf "[ %d, %d, %d ],\n", @$_ for @collapsed; __DATA__ [ -1, 5, 1 ], [ 0, 5, 1 ], [ 0, 5, 1 ], [ 1, 5, 1 ], [ 3, 4, 1 ], [ 5, 1, 1 ], [ 5, 0, 1 ], [ 5, 0, 1 ], [ 5, 0, 1 ], [ 5, 0, 1 ], [ 5, 0, 1 ], [ 5, 0, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ -23, -64, 0 ], [ -5, 0, 1 ], [ -5, 1, 1 ],

    which outputs:

    [ -1, 5, 1 ], [ 0, 10, 1 ], [ 1, 5, 1 ], [ 3, 4, 1 ], [ 5, 1, 1 ], [ 30, 0, 1 ], [ 0, -45, 1 ], [ -23, -64, 0 ], [ -5, 0, 1 ], [ -5, 1, 1 ],
Re: Summing up duplicate lines
by LanX (Saint) on May 09, 2024 at 10:35 UTC
    Your description leaves too much room for interpretation.

    I assume you just want to "collapse" consecutive lines and you don't care about "repetitions" which are separated.

    Hence no %seen hash is needed, just remember the $last line aka array and compare it to the $current one.

    If the $current one fits your criteria add it to $last, if it doesn't print $last and make $last=$current for the next iteration.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    see Wikisyntax for the Monastery

      Here we go, implementing it revealed more ambiguities

      use v5.12; use warnings; use Data::Dump; my @in = do { local $/; eval <DATA> }; sub DBG { ddx @_ if 0 } # debug DBG "INPUT"=> @in; my @out; my $similarity = 3; # number equal elements my $last = shift @in; my $sum = [ @$last ]; # init while ( my $cur = shift @in ) { my @zeros = grep { $cur->[$_] == 0 } 0..1; my @same = grep { $cur->[$_] == $last->[$_] } 0..$similarity-1; if ( @zeros and @same == $similarity ) { my $non_zero = 1 - $zeros[0]; DBG "SUM" => $last, $cur, \@zeros, $non_zero ; $sum->[$non_zero] += $cur->[$non_zero]; } else { DBG "OUT" => $last, $cur, \@zeros; push @out, $sum; $last = $cur; $sum = [ @$last ]; # init } } push @out, $sum; dd @out; __DATA__ [ -1, 5, 1 ], [ 0, 5, 1 ], [ 0, 5, 1 ], [ 1, 5, 1 ], # separated repetitions # [ 0, 5, 1 ], # [ 0, 5, 1 ], [ 3, 4, 1 ], [ 5, 1, 1 ], # Testcases for ambiguities #[ 1000,0,1], #[ 5, 0, 1000 ], [ 5, 0, 1 ], [ 5, 0, 1 ], [ 5, 0, 1 ], [ 5, 0, 1 ], [ 5, 0, 1 ], [ 5, 0, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ 0, -5, 1 ], [ -23, -64, 0 ], [ -5, 0, 1 ], [ -5, 1, 1 ],

      ( [-1, 5, 1], [0, 10, 1], [1, 5, 1], [3, 4, 1], [5, 1, 1], [30, 0, 1], [0, -45, 1], [-23, -64, 0], [-5, 0, 1], [-5, 1, 1], )

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      see Wikisyntax for the Monastery

      update

      fixed final push @out, $sum;

      Heh, perhaps that's something else that was under-specified in my question. :) I understand the idea behind it - it's not a complex one - but my brain was just refusing to produce the code to match it. It was just a weird day when I was unable to focus (for Various But Definite Reasons); any programmer who's been around a while has seen this pattern many times.
      I certainly appreciate the help you folks provided, though!

      -- 
      I hate storms, but calms undermine my spirits.
       -- Bernard Moitessier, "The Long Way"
        Actually, this collapsing/ folding was harder to implement than I thought. Too many off-by-one edge cases.

        Still looking into simplifying the code.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        see Wikisyntax for the Monastery