comment on

I also went the route of creating a regular expression, and here is what I came up with. This RE most likely has abysmal exponential performance, as it backtracks extensively. It also does not work with duplicate elements in the input stream, as a (Perl) regular expression will be content with the leftmost match - had Perl a POSIX RE engine, that engine should match the longest match from what I remember...

Anyway, the code "simply" constructs a regular expression of one list that matches all possible subsequences in order, and then matches that list against the other list as a string. Spaces are used as the delimiters between the elements.

#!/usr/bin/perl -w
use strict;
use Test::More tests => 3;

sub lcos {
  my ($list1,$list2) = @_;
  my @list1 = @$list1;
  my @list2 = @$list2;
  
  # Assume that neither list1 nor list2 contain elements 
  # that stringify to the same value, that is, neither
  # list1 nor list2 contain both "" and undef.
  # Also, a blank is chosen as a delimiting character which
  # shall appear in no element of list1 or list2
  
  $list2 = " " . join(" ", @list2) . " ";
  my $matcher = lcos_match(@list1);
  #print $matcher,"\n";
  my @result;
  @result = split " ", $1 
    if $list2 =~ $matcher;
  @result;
};

sub lcos_match {
  my $match = '(';
  while (@_) {
    $match .= match_sequence(@_);
    shift;
    @_ and $match .= "|";
  };
  $match .= ')';
  $match;
};

sub match_sequence {
  my $result = "";
  my $element;
  while (defined ($element = pop)) {
    $result = $result ? qr{ \Q$element\E$result?} : qr{ \Q$element\E }
+;
  };
  $result;
};

while (<DATA>) {
  my ($l1,$l2,$expected) = split /\s*\|\s*/;
  my @l1 = split /\s*/, $l1;
  my @l2 = split /\s*/, $l2;
  my @expected = split /\s*/, $expected;
  
  $" = ",";
  $, = ",";
  is_deeply( [ lcos(\@l1,\@l2)], \@expected, "@l1 | @l2");
};

__DATA__
a b c d f g h j q z | a b c d e f g i j k r x y z | a b c d
a b c | a b x c | a b
a b c d | a b x b c d | b c d
[download]

In reply to Re: Re: Finding largest common subset in lists? by Corion
in thread Finding largest common subset in lists? by anjiro

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.