comment on

Here's my sixth try at it. It runs in two modes, fast & exhaustive. Fast mode only works with the two longest prefix/suffix at each stage. It is possible that fast mode will not produce the shortest solution. Notice that as set up it doesn't run an exhaustive test on the 24 word test case, but it completes all the rest in about 150 seconds on my machine.

It basically does a trie on both ends (prefix or suffix) until things are reduced to one word.

#!/usr/bin/perl

use strict; # https://perlmonks.org/?node_id=11131998
use warnings;
use List::Util qw( uniq first );
$| = 1;

my $braces = qr/(\{(?:(?1))\}|[^{}\n])/;     # {} must be balanced
my $leftduplicate = qr/^($braces+).*\n\1/m;
my $rightduplicate = qr/($braces+)\n(?:.*\n)*.*\1$/m;
my %cache;
my $fast;

sub list2glob                           # find the glob for a set of w
+ords
  {
  local $_ = join "\n", my @words = uniq sort @_;
  @words > 1 or return $words[0];
  $cache{$_} and return $cache{$_};
  my (@best, @found);

  /$leftduplicate(?{ push @{ $found[length $1] }, "l $1" })(*FAIL)/;
  /$rightduplicate(?{ push @{ $found[length $1] }, "r $1" })(*FAIL)/;

  @found or return $cache{$_} =  # join all words in brace group
      '{' . join(',', map s/^\{($braces*)\}$/$1/r, @words) . '}';

  my $count = 0;
  for ( reverse grep defined, @found )       # process duplicate ends
    {
    for ( uniq @$_ ) # a duplicate left or right end (trie it !)
      {
      my ($side, $same) = split;
      my ($regex, $left, $right) = $side eq 'l' ?
        (qr/^\Q$same\E/, $same, '') : (qr/\Q$same\E$/, '', $same);
      my $answer = list2glob( (grep !/$regex/, @words ),
        $left . list2glob( map { /$regex/ ? "$`$'" : () } @words ) . $
+right);
      $best[$answer =~ tr/{},//c] = $answer;
      }
    $fast and ++$count >= 2 and last; # do only two longest duplicates
+ in fast
    }
  return $cache{$_} = first {defined} @best; # shortest solution
  }

sub runtest          # run, show, and validate the glob
  {
  %cache = ();
  my @words = uniq sort @_;
  print $fast ? "(      fast)  " : "(exhaustive)  ";
  my $answer = list2glob(@words) // '*NONE*';
  print "$answer\n";
  my @glob = uniq sort glob $answer;
  "@glob" eq "@words" or die "\n** FAILED **\n got: @glob\nwant: @word
+s\n";
  }

while( <DATA> )
  {
  s/#.*//;                     # remove comments
  my @words = split or next;   # ignore blank lines
  print "@words  =>\n";
  $fast = 1;
  runtest( @words );
  $fast = 0;
  @words <= 20 and             # FIXME skip big exhaustive tests
  runtest( @words );
  print "\n";
  }
print "SUCCESS !!\n";

__DATA__

aVb aWb aXXb aYb aZb                             # debugging test case
+s
abcdXegh abcdYfgh
ac ad bc bd
ad ae af bd be bf cd ce cf
a1b3c a1b4c a2b3c a2b4c a1b5c a2b5c
fee fie foe fum
one two three four five six seven eight nine ten
a bx cy dx ey fx g hx i jy aa
a aa aaa aaaa aaaaa anestingtest

a b                       # tests from https://perlmonks.org/?node_id=
+11131998
ab ac
aXb aYb
a1b3c a1b4c a2b3c a2b4c
/ab/ef/ij/kl /ab/ef/ij/mn /ab/ef/ij /ab/gh/ij/kl /ab/gh/ij/mn /ab/gh/i
+j /cd/ef/ij/kl /cd/ef/ij/mn /cd/ef/ij /cd/gh/ij/kl /cd/gh/ij/mn /cd/g
+h/ij
abdel abdelmn abdelmo abdfgkl abdfgklmn abdfgklmo abdfghkl abdfghklmn 
+abdfghklmo abdfgijkl abdfgijklmn abdfgijklmo acdel acdelmn acdelmo ac
+dfgkl acdfgklmn acdfgklmo acdfghkl acdfghklmn acdfghklmo acdfgijkl ac
+dfgijklmn acdfgijklmo
[download]

Outputs:

aVb aWb aXXb aYb aZb  =>
(      fast)  a{V,W,XX,Y,Z}b
(exhaustive)  a{V,W,XX,Y,Z}b

abcdXegh abcdYfgh  =>
(      fast)  abcd{Xe,Yf}gh
(exhaustive)  abcd{Xe,Yf}gh

ac ad bc bd  =>
(      fast)  {a,b}{c,d}
(exhaustive)  {a,b}{c,d}

ad ae af bd be bf cd ce cf  =>
(      fast)  {a,b,c}{d,e,f}
(exhaustive)  {a,b,c}{d,e,f}

a1b3c a1b4c a2b3c a2b4c a1b5c a2b5c  =>
(      fast)  a{1,2}b{3,4,5}c
(exhaustive)  a{1,2}b{3,4,5}c

fee fie foe fum  =>
(      fast)  f{um,{e,i,o}e}
(exhaustive)  f{um,{e,i,o}e}

one two three four five six seven eight nine ten  =>
(      fast)  {eight,four,six,two,{fiv,thre,{ni,o}n}e,{sev,t}en}
(exhaustive)  {eight,four,six,two,{fiv,thre,{ni,o}n}e,{sev,t}en}

a bx cy dx ey fx g hx i jy aa  =>
(      fast)  {g,i,{,a}a,{b,d,f,h}x,{c,e,j}y}
(exhaustive)  {g,i,{,a}a,{b,d,f,h}x,{c,e,j}y}

a aa aaa aaaa aaaaa anestingtest  =>
(      fast)  a{,a{,a{,{,a}a}},nestingtest}
(exhaustive)  a{,nestingtest,{,{,{,a}a}a}a}

a b  =>
(      fast)  {a,b}
(exhaustive)  {a,b}

ab ac  =>
(      fast)  a{b,c}
(exhaustive)  a{b,c}

aXb aYb  =>
(      fast)  a{X,Y}b
(exhaustive)  a{X,Y}b

a1b3c a1b4c a2b3c a2b4c  =>
(      fast)  a{1,2}b{3,4}c
(exhaustive)  a{1,2}b{3,4}c

/ab/ef/ij/kl /ab/ef/ij/mn /ab/ef/ij /ab/gh/ij/kl /ab/gh/ij/mn /ab/gh/i
+j /cd/ef/ij/kl /cd/ef/ij/mn /cd/ef/ij /cd/gh/ij/kl /cd/gh/ij/mn /cd/g
+h/ij  =>
(      fast)  /{ab,cd}/{ef,gh}/ij{,/{kl,mn}}
(exhaustive)  /{ab,cd}/{ef,gh}/ij{,/{kl,mn}}

abdel abdelmn abdelmo abdfgkl abdfgklmn abdfgklmo abdfghkl abdfghklmn 
+abdfghklmo abdfgijkl abdfgijklmn abdfgijklmo acdel acdelmn acdelmo ac
+dfgkl acdfgklmn acdfgklmo acdfghkl acdfghklmn acdfghklmo acdfgijkl ac
+dfgijklmn acdfgijklmo  =>
(      fast)  a{b,c}d{e,fg{,h,ij}k}l{,m{n,o}}

SUCCESS !!
[download]

Note: I changed from your testing scheme because some tests generate different solutions but both are valid.

In reply to Re: Challenge: Generate a glob patterns from a word list by tybalt89
in thread Challenge: Generate a glob patterns from a word list by choroba

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.