For example, A001|B002 has already been seen, the assigned numbers are 1 for A001 and 2 for A002. We then read B002|A001|C003. C003 will be assigned the number 3, so the stored vector is 011 and the new one is 111. If we use 2 instead of 1 in the stored vectors, we can just bitwise or the two numbers and see what should be done: "22" | "111" = 331, where 1 means "present in the new only" and 3 means "present in both" (2 would be "present in stored only"). If there are 3's only, we've already seen exactly the same combination of parts. If there's no 1, the new combination was contained in a stored one, if there's no 2, the new combination contains the old one.
#!/usr/bin/perl use warnings; use strict; use feature qw{ say }; my $last_part = 1; my (%part, %store); while (my $line = <DATA>) { chomp $line; my @parts = split /\|/, $line; ! exists $part{$_} and $part{$_} = $last_part++ for @parts; my $string = 0 x ($last_part - 1); substr $string, $part{$_}, 1, 1 for @parts; my %back = reverse %part; my $store = 1; for my $seen (keys %store) { my $result = "$seen" | "$string"; if ($result =~ /^3+$/ # Same as old. || $result !~ /1/ # Contained in old. ) { $store = 0; } elsif ($result !~ /2/) { # Contains old. delete $store{$seen}; } } undef $store{$string =~ tr/1/2/r} if $store; } say 'Kept: '; my %back = reverse %part; for my $stored (keys %store) { say join '|', map substr($stored, $_, 1) ? $back{$_} : (), 1 .. length $stored; } __DATA__ A001|B002 C003|A001|B002 B002|A001 C003|D004|A001 E005|F006 D004|C003
Update: Switched to bitwise string or from +, so Math::BigInt is not needed.
In reply to Re: Best method to eliminate substrings from array
by choroba
in thread Best method to eliminate substrings from array
by catemp
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |