Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: Finding largest common subset in lists?

by BrowserUk (Patriarch)
on Jun 05, 2003 at 09:49 UTC ( [id://263263]=note: print w/replies, xml ) Need Help??


in reply to Finding largest common subset in lists?

Update DO NOT USE! THIS IS A TERMINALLY BROKEN ALGORITHM THAT GET THE RIGHT ANSWER SOMETIMES ALMOST BY LUCK. I DOUBT IT CAN BE EASILY FIXED.

Must test more! Must test more!

Whether this is 'cleaner' is doubtful, given this is the first time ever I've felt the need to use redo, but if your big set is very big, is should result in less total memory use and less comparisons. Ie. It should be faster, I think.

The basic premise to first locate all the subsets of @a that contain only values that exist in @b, and then use join and index to verify if the subset actually exists in @b.

#! perl -slw use strict; # gen some test data (often generates warnings too) #my @a = map{ int rand 100 } 1 .. 100; print "@a"; #my @b; push @b, @a[ ($_ = rand(100)) .. ($_ + rand(10)) ] for 1 .. 10 +; print "@b"; my @a = qw(fred bob joe jim mary elaine); my @b = qw(frank joe jim mary bob); my %b; @b{ @b } = (); my $b = join ',', @b ; my @subsets; my $start = 0; OUTER: { for( $start .. $#a ) { next if exists $b{ $a[$_] }; if( $_ > $start+1 ) { my $a = join ',', @a[ $start .. ($_-1) ]; push @subsets, [ ($_ - 1 - $start), $start, $_ -1 ] if 1+i +ndex( $b, $a ); } $start++; redo OUTER; } } my $longest = [0]; $_->[0] > $longest->[0] and $longest = $_ for @subsets; print "@a[ $longest->[1] .. $longest->[2] ]";

It finds the apppropriate subset of your sample data, and works quite quickly for various datasets I generated randomly. Full testing is left to you.

It handles duplicates in either set (collection:), and could probably be more efficient if this isn't a requirement.

Caveats:

The usual one about elements that could contain your chosen seperator which you are probably aware of.

As is, it will only find the first, longest common subset. If two or more, equally long, subsets exist and you want them all, you have a little work left to do:)


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://263263]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (4)
As of 2024-04-20 04:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found