in reply to Challenge: Generate a glob patterns from a word list

I'd start maybe looking at Regexp::Trie for inspiration; at least conceptually its similar just you'll be targetting glob semantics rather than regexp.

Edit: Looking at the original question domain (mkpath -p on directory names) I wonder if you might couldn't cheat and work path segment by path segment (split into sections on qr{/} then kind of do a trie-ish thing on each segment). Don't know if that's applicable for the more general question.

The cake is a lie.
The cake is a lie.
The cake is a lie.

Replies are listed 'Best First'.
Re^2: Challenge: Generate a glob patterns from a word list
by LanX (Saint) on May 04, 2021 at 11:42 UTC
    AFAIK are Trie algorithms effectively building a tree.

    But now we have a search graph with diamond structures.

    IOW or-clauses in a trie are nested and never chained.

    (Unless I'm missing something)

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      No that's a good point. I started following my own suggestion and had just that problem. I'd not match the expected in that I'd get "too much" commonality duplicated and wound up with a{1b{3c,4c},2b{3c,4c}} rather than a{1,2}b{3,4}c. Now it "works" and produces the same names for 5/6 outputs (expanding those with glob before testing), but it's not what I'd call an optimal glob representation (in that I think there's too much duplicated).

      The cake is a lie.
      The cake is a lie.
      The cake is a lie.

      I think you are right, What we have here is a N-way diff (where N is the number of strings).
      My guess is that it is horribly explosive in N.

      A 2-way diff is bad enough that it takes caching to perform adequately. N-way would be much, much worse.

        In fact, my idea was to construct a finite state machine that accepted all the input words and minimise it. I was able to do that, but I was then unable to correctly interpret the states and transitions to create the output string.

        map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
        > My guess is that it is horribly explosive in N.

        I doubt it.

        Intuitively I'd say a two phase approach should work.

        • Build a tree like with a Trie
        • Parse the tree bottom-up and collapse it into a graph by using look-up hashes for duplicated sub-trees
        Since a Trie-tree isn't "exploding", this should be safe.

        Untested, of course.

        UPDATE

        Sorry, I should have read the examples in the OP more closely.

        I wasn't aware that

        • Or-groups were nested a{b,{c,d}}e (not even that it's possible)
        • empty alternatives were allowed a{b{,c}}d
        Without that my approach above should work.

        Now I'm rather pessimistic.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery