Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re: Finding largest common subset in lists?

by gjb (Vicar)
on Jun 05, 2003 at 09:52 UTC ( [id://263265]=note: print w/replies, xml ) Need Help??


in reply to Finding largest common subset in lists?

A somewhat non-standard approach, just for the fun of it:

my @a = qw( fred bob joe jim mary elaine ); my @b = qw( frank joe jim mary bob ); my $str = join(" ", @a) . "&" . join(" ", @b); if ($str =~ /(?:\b\w+\b\s*)+? ((?:\b\w+\b\s*)+) (?:\s*\b\w+\b\s*)+? & (?:\b\w+\b\s*)+? \1 (?:\s*\b\w+\b)+?/x) { my $result = $1; $result =~ s/\s*$//; print "found '$result'\n"; }

Note: it works on the example given, I didn't test it extensively.

Just my 2 cents, -gjb-

Replies are listed 'Best First'.
Re: Re: Finding largest common subset in lists?
by zby (Vicar) on Jun 05, 2003 at 10:19 UTC
    It won't work on general data. It will find the first subsequence of @a that is in @b and be content with it. But it was an interesting attempt.

    Update: For @a = qw(a b c); @b = qw(a b c) it prints b. For two element lists it does not work at all.

    Update: Changed the code to:

    my $str = join(" ", @a) . "&" . join(" ", @b); if ($str =~ /(?:\b\w+\b\s*)*? ((?:\b\w+\b\s*)+) (?:\s*\b\w+\b\s*)*? & (?:\b\w+\b\s*)*? \1 (?:\s*\b\w+\b)*?/x) { my $result = $1; $result =~ s/\s*$//; print "found '$result'\n"; }
    It does not have the problem with cutting the first and last elements of the list, but still for @a = qw(a b c); @b = qw(a x b c) it prints a.
Re: Re: Finding largest common subset in lists?
by Corion (Patriarch) on Jun 05, 2003 at 15:13 UTC

    I also went the route of creating a regular expression, and here is what I came up with. This RE most likely has abysmal exponential performance, as it backtracks extensively. It also does not work with duplicate elements in the input stream, as a (Perl) regular expression will be content with the leftmost match - had Perl a POSIX RE engine, that engine should match the longest match from what I remember...

    Anyway, the code "simply" constructs a regular expression of one list that matches all possible subsequences in order, and then matches that list against the other list as a string. Spaces are used as the delimiters between the elements.

    #!/usr/bin/perl -w use strict; use Test::More tests => 3; sub lcos { my ($list1,$list2) = @_; my @list1 = @$list1; my @list2 = @$list2; # Assume that neither list1 nor list2 contain elements # that stringify to the same value, that is, neither # list1 nor list2 contain both "" and undef. # Also, a blank is chosen as a delimiting character which # shall appear in no element of list1 or list2 $list2 = " " . join(" ", @list2) . " "; my $matcher = lcos_match(@list1); #print $matcher,"\n"; my @result; @result = split " ", $1 if $list2 =~ $matcher; @result; }; sub lcos_match { my $match = '('; while (@_) { $match .= match_sequence(@_); shift; @_ and $match .= "|"; }; $match .= ')'; $match; }; sub match_sequence { my $result = ""; my $element; while (defined ($element = pop)) { $result = $result ? qr{ \Q$element\E$result?} : qr{ \Q$element\E } +; }; $result; }; while (<DATA>) { my ($l1,$l2,$expected) = split /\s*\|\s*/; my @l1 = split /\s*/, $l1; my @l2 = split /\s*/, $l2; my @expected = split /\s*/, $expected; $" = ","; $, = ","; is_deeply( [ lcos(\@l1,\@l2)], \@expected, "@l1 | @l2"); }; __DATA__ a b c d f g h j q z | a b c d e f g i j k r x y z | a b c d a b c | a b x c | a b a b c d | a b x b c d | b c d

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://263265]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (5)
As of 2024-04-19 12:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found