This last step in your proposed algorithm seems like too much work:

It should suffice to assign ranking scores to the rules in advance, based on the relative "wildness" allowed by wildcard operators, then apply the rules in their ranked order (most specific to least specific) and invoke the action for the first one that matches.

#!/usr/bin/perl use strict; use warnings; use Text::Glob 'match_glob'; # let's make up some dummy subroutines my @actions; for my $act ( 1 .. 7 ) { my $msg = " is handled by action$act\n"; $actions[$act] = sub { print $_[0], $msg }; } my %rule = ( '*' => 1, '*.txt' => 2, '*.tx?' => 7, 'fred/*' => 3, 'fred/*.mac' => 4, 'george.txt' => 5, 'fred/george.txt' => 6, ); my %rankings; for ( keys %rule ) { my $rank = length() * 100 - ( tr/*// ) * 10 - ( tr/?// ); push @{$rankings{$rank}}, $_; } my @rank_order = sort {$b<=>$a} keys %rankings; while (<DATA>) { chomp; my $matched; for my $rank ( @rank_order ) { for my $glob ( @{$rankings{$rank}} ) { if ( match_glob( $glob, $_ )) { $matched = $rule{$glob}; last; } } last if $matched; } if ( $matched ) { $actions[$matched]->( $_ ); } } __DATA__ foo.bar foo.bar.txt foo.bar.txo fred/foo.bar fred/foo.bar.mac george.txt fred/george.txt
I suspect that someone else could come up with a more clever way to assign ranking scores, but the idea above is: longer patterns have higher scores (are more specific) than shorter ones; a pattern of a given length gets the most points if it has no wildcards; it loses 1 point for each "?" and 10 points for each "*". I think this captures the user's expectation.

(It would make sense for patterns with no wildcards to be given a uniform maximum score, but I don't expect this would make a big difference in performance.) ((Update: On second thought, depending on how many file names you're checking and the relative quantities/proportions of exact-match vs. wildcard rules, it might be worthwhile to test all the exact-match rules as a group first, using just eq instead of a call to match_glob, and this would be easy if you assign a uniform, constant max score to all the exact-match rules, regardless of their length.))


In reply to Re: Glob best match? by graff
in thread Glob best match? by tlhackque

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.