comment on

If the number of all possible parts isn't too high, you can assign numbers to the parts and use vectors where each positions in the vector says whether the corresponding part is present or not. I used 8 bits to represent each part, but 2 bits should be enough if you need to reduce space (1 bit isn't enough, we need 4 different values, as will be explained shortly).

For example, A001|B002 has already been seen, the assigned numbers are 1 for A001 and 2 for A002. We then read B002|A001|C003. C003 will be assigned the number 3, so the stored vector is 011 and the new one is 111. If we use 2 instead of 1 in the stored vectors, we can just bitwise or the two numbers and see what should be done: "22" | "111" = 331, where 1 means "present in the new only" and 3 means "present in both" (2 would be "present in stored only"). If there are 3's only, we've already seen exactly the same combination of parts. If there's no 1, the new combination was contained in a stored one, if there's no 2, the new combination contains the old one.

#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };

my $last_part = 1;
my (%part, %store);
while (my $line = <DATA>) {
    chomp $line;
    my @parts = split /\|/, $line;
    ! exists $part{$_} and $part{$_} = $last_part++ for @parts;
    my $string = 0 x ($last_part - 1);
    substr $string, $part{$_}, 1, 1 for @parts;
    my %back = reverse %part;
    my $store = 1;
    for my $seen (keys %store) {
        my $result = "$seen" | "$string";

        if ($result =~ /^3+$/  # Same as old.
            || $result !~ /1/  # Contained in old.
        ) {
            $store = 0;

        } elsif ($result !~ /2/) {  # Contains old.
            delete $store{$seen};
        }
    }
    undef $store{$string =~ tr/1/2/r} if $store;
}

say 'Kept: ';
my %back = reverse %part;
for my $stored (keys %store) {
    say join '|',
        map substr($stored, $_, 1) ? $back{$_} : (),
        1 .. length $stored;
}

__DATA__
A001|B002
C003|A001|B002
B002|A001
C003|D004|A001
E005|F006
D004|C003
[download]

Update: Switched to bitwise string or from +, so Math::BigInt is not needed.

map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

In reply to Re: Best method to eliminate substrings from array by choroba
in thread Best method to eliminate substrings from array by catemp

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.