Re: Summing up duplicate lines
by choroba (Cardinal) on May 08, 2024 at 19:56 UTC
|
The specification is still vague. What should the third value be if it's not the same, e.g. [0, 5, 1] and [0, 5, 2]? Also, in what order should the triplets be printed?
The following code uses the order of the first appearance of the first two columns, and uses the third value from the first occurrence.
Update: The %seen hash contains the index into the @out array, i.e. the index of the first occurrence of the two values.
#!/usr/bin/perl
use strict;
use feature qw{ say };
use warnings;
my %seen;
my @out;
while (<DATA>) {
my @n = /-?[0-9]+/g;
if ($n[0] == 0 || $n[1] == 0) {
if (exists $seen{"@n[0, 1]"}) {
$out[ $seen{"@n[0, 1]"} ][$_] += $n[$_] for 0, 1;
} else {
push @out, \@n;
$seen{"@n[0, 1]"} = $#out;
}
} else {
push @out, \@n;
}
}
say "[@$_]" for @out;
__DATA__
[ -1, 5, 1 ],
[ 0, 5, 1 ],
[ 0, 5, 1 ],
[ 1, 5, 1 ],
[ 3, 4, 1 ],
[ 5, 1, 1 ],
[ 5, 0, 1 ],
[ 5, 0, 1 ],
[ 5, 0, 1 ],
[ 5, 0, 1 ],
[ 5, 0, 1 ],
[ 5, 0, 1 ],
[ 0, -5, 1 ],
[ 0, -5, 1 ],
[ 0, -5, 1 ],
[ 0, -5, 1 ],
[ 0, -5, 1 ],
[ 0, -5, 1 ],
[ 0, -5, 1 ],
[ 0, -5, 1 ],
[ 0, -5, 1 ],
[ -23, -64, 0 ],
[ -5, 0, 1 ],
[ -5, 1, 1 ],
Output: [-1 5 1]
[0 10 1]
[1 5 1]
[3 4 1]
[5 1 1]
[30 0 1]
[0 -45 1]
[-23 -64 0]
[-5 0 1]
[-5 1 1]
map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
| [reply] [d/l] [select] |
|
You're right - I didn't even think to specify this. There's no case in which the third value can be different among repeats, so it's not an issue.
As to the order of the triplets, it would be the original one except for the "collapsed" ones.
Your solution looks great - thank you, much appreciated!
--
I hate storms, but calms undermine my spirits.
-- Bernard Moitessier, "The Long Way"
| [reply] |
Re: Summing up duplicate lines
by hv (Prior) on May 08, 2024 at 20:21 UTC
|
The first rule for any problem like this is: define a signature such that elements which should be seen as the same have the same signature, then use a hash with signatures as keys to group things.
I think by "repeated lines" you mean lines in which all three values are the same, so the signature should include all three values:
sub signature {
my $arrayref = shift;
# the examples imply that the inputs are integers
# if not, use a different character to join them
return join '.', @$arrayref;
}
Once you have a signature, you can use a hash to accumulate any duplicates; in this case we just need a count:
my %seen;
for my $arrayref (@all_inputs) {
if ($arrayref->[0] && $arrayref->[1]) {
# not interesting, just pass it through
print_result($arrayref);
} else {
# get a signature, increase the count for that signature by 1
++$seen{ signature($arrayref) };
}
}
# now construct combined results for the saved inputs
for my $sig (keys %seen) {
my $count = $seen{$sig};
my $arrayref = [ split /\./, $sig ];
# keep it easy: multiplying the zero value is a noop
$arrayref->[$_] *= $count for (0, 1);
print_result($arrayref);
}
For your example data, this ends up with combined lines [ 0, 10, 1 ], [ -5, 0, 1 ], [ 30, 0, 1 ], [ 0, -45, 1 ]. | [reply] [d/l] [select] |
|
[ -1, 5, 1 ],
[ 1, 5, 1 ],
[ 3, 4, 1 ],
[ 5, 1, 1 ],
[ -23, -64, 0 ],
[ -5, 1, 1 ],
[ 30, 0, 1 ],
[ -5, 0, 1 ],
[ 0, -45, 1 ],
[ 0, 10, 1 ]
which skipped some lines. For comparison, choroba's was
[-1 5 1]
[0 10 1]
[1 5 1]
[3 4 1]
[5 1 1]
[30 0 1]
[0 -45 1]
[-23 -64 0]
[-5 0 1]
[-5 1 1]
which was correct.
--
I hate storms, but calms undermine my spirits.
-- Bernard Moitessier, "The Long Way"
| [reply] [d/l] [select] |
|
| [reply] |
|
|
Re: Summing up duplicate lines
by tybalt89 (Monsignor) on May 08, 2024 at 22:56 UTC
|
#!/usr/bin/perl
use strict; # https://perlmonks.org/?node_id=11159342
use warnings;
use List::Util qw( sum );
local $_ = join '', <DATA>;
s/(?:\[ 0, \S+(,.*\n)){2,}/ my $last = $1;
'[ 0, ' . (sum $& =~ m~, (\S+),~g) . $last/ge;
s/(?:\[ .*(, 0, .*\n)){2,}/ my $last = $1;
'[ ' . (sum $& =~ m~\[ (\S+),~g) . $last/ge;
print;
__DATA__
[ -1, 5, 1 ],
[ 0, 5, 1 ],
[ 0, 5, 1 ],
[ 1, 5, 1 ],
[ 3, 4, 1 ],
[ 5, 1, 1 ],
[ 5, 0, 1 ],
[ 5, 0, 1 ],
[ 5, 0, 1 ],
[ 5, 0, 1 ],
[ 5, 0, 1 ],
[ 5, 0, 1 ],
[ 0, -5, 1 ],
[ 0, -5, 1 ],
[ 0, -5, 1 ],
[ 0, -5, 1 ],
[ 0, -5, 1 ],
[ 0, -5, 1 ],
[ 0, -5, 1 ],
[ 0, -5, 1 ],
[ 0, -5, 1 ],
[ -23, -64, 0 ],
[ -5, 0, 1 ],
[ -5, 1, 1 ],
which outputs:
[ -1, 5, 1 ],
[ 0, 10, 1 ],
[ 1, 5, 1 ],
[ 3, 4, 1 ],
[ 5, 1, 1 ],
[ 30, 0, 1 ],
[ 0, -45, 1 ],
[ -23, -64, 0 ],
[ -5, 0, 1 ],
[ -5, 1, 1 ],
| [reply] [d/l] [select] |
Re: Summing up duplicate lines
by tybalt89 (Monsignor) on May 10, 2024 at 15:13 UTC
|
#!/usr/bin/perl
use strict; # https://perlmonks.org/?node_id=11159342
use warnings;
my @collapsed = eval <DATA>;
$_->[0] == 0 && $collapsed[-1][0] == 0 ? ($collapsed[-1][1] += $_->[1]
+)
: $_->[1] == 0 && $collapsed[-1][1] == 0 ? ($collapsed[-1][0] += $_-
+>[0])
: push @collapsed, $_
for map eval, <DATA>;
printf "[ %d, %d, %d ],\n", @$_ for @collapsed;
__DATA__
[ -1, 5, 1 ],
[ 0, 5, 1 ],
[ 0, 5, 1 ],
[ 1, 5, 1 ],
[ 3, 4, 1 ],
[ 5, 1, 1 ],
[ 5, 0, 1 ],
[ 5, 0, 1 ],
[ 5, 0, 1 ],
[ 5, 0, 1 ],
[ 5, 0, 1 ],
[ 5, 0, 1 ],
[ 0, -5, 1 ],
[ 0, -5, 1 ],
[ 0, -5, 1 ],
[ 0, -5, 1 ],
[ 0, -5, 1 ],
[ 0, -5, 1 ],
[ 0, -5, 1 ],
[ 0, -5, 1 ],
[ 0, -5, 1 ],
[ -23, -64, 0 ],
[ -5, 0, 1 ],
[ -5, 1, 1 ],
which outputs:
[ -1, 5, 1 ],
[ 0, 10, 1 ],
[ 1, 5, 1 ],
[ 3, 4, 1 ],
[ 5, 1, 1 ],
[ 30, 0, 1 ],
[ 0, -45, 1 ],
[ -23, -64, 0 ],
[ -5, 0, 1 ],
[ -5, 1, 1 ],
| [reply] [d/l] [select] |
Re: Summing up duplicate lines
by LanX (Saint) on May 09, 2024 at 10:35 UTC
|
Your description leaves too much room for interpretation.
I assume you just want to "collapse" consecutive lines and you don't care about "repetitions" which are separated.
Hence no %seen hash is needed, just remember the $last line aka array and
compare it to the $current one.
If the $current one fits your criteria add it to $last, if it doesn't print $last and make $last=$current for the next iteration.
| [reply] |
|
Here we go, implementing it revealed more ambiguities
use v5.12;
use warnings;
use Data::Dump;
my @in = do { local $/; eval <DATA> };
sub DBG { ddx @_ if 0 } # debug
DBG "INPUT"=> @in;
my @out;
my $similarity = 3; # number equal elements
my $last = shift @in;
my $sum = [ @$last ]; # init
while ( my $cur = shift @in ) {
my @zeros = grep { $cur->[$_] == 0 } 0..1;
my @same = grep { $cur->[$_] == $last->[$_] } 0..$similarity-1;
if ( @zeros and @same == $similarity ) {
my $non_zero = 1 - $zeros[0];
DBG "SUM" => $last, $cur, \@zeros, $non_zero ;
$sum->[$non_zero] += $cur->[$non_zero];
}
else {
DBG "OUT" => $last, $cur, \@zeros;
push @out, $sum;
$last = $cur;
$sum = [ @$last ]; # init
}
}
push @out, $sum;
dd @out;
__DATA__
[ -1, 5, 1 ],
[ 0, 5, 1 ],
[ 0, 5, 1 ],
[ 1, 5, 1 ],
# separated repetitions
# [ 0, 5, 1 ],
# [ 0, 5, 1 ],
[ 3, 4, 1 ],
[ 5, 1, 1 ],
# Testcases for ambiguities
#[ 1000,0,1],
#[ 5, 0, 1000 ],
[ 5, 0, 1 ],
[ 5, 0, 1 ],
[ 5, 0, 1 ],
[ 5, 0, 1 ],
[ 5, 0, 1 ],
[ 5, 0, 1 ],
[ 0, -5, 1 ],
[ 0, -5, 1 ],
[ 0, -5, 1 ],
[ 0, -5, 1 ],
[ 0, -5, 1 ],
[ 0, -5, 1 ],
[ 0, -5, 1 ],
[ 0, -5, 1 ],
[ 0, -5, 1 ],
[ -23, -64, 0 ],
[ -5, 0, 1 ],
[ -5, 1, 1 ],
(
[-1, 5, 1],
[0, 10, 1],
[1, 5, 1],
[3, 4, 1],
[5, 1, 1],
[30, 0, 1],
[0, -45, 1],
[-23, -64, 0],
[-5, 0, 1],
[-5, 1, 1],
)
update
fixed final push @out, $sum;
| [reply] [d/l] [select] |
|
Heh, perhaps that's something else that was under-specified in my question. :) I understand the idea behind it - it's not a complex one - but my brain was just refusing to produce the code to match it. It was just a weird day when I was unable to focus (for Various But Definite Reasons); any programmer who's been around a while has seen this pattern many times.
I certainly appreciate the help you folks provided, though!
--
I hate storms, but calms undermine my spirits.
-- Bernard Moitessier, "The Long Way"
| [reply] |
|
| [reply] |