LanX has asked for the wisdom of the Perl Monks concerning the following question:

Hi

We are using web-forms where users can configure filters for various lists of strings.

I've convinced my colleagues to use globs instead of far to powerful regexes.

For that we are using Text::Glob which is internally translating Perl's glob syntax to a regex.

But I stumbled over an incompatibility while testing with character classes provided with [...] because an unpaired [ is not masked.

Demo:

DB<84> x <[*> 0 '[KSR_3.pl' DB<85> $re = glob_to_regex( '[*' ) Unmatched [ in regex; marked by <-- HERE in m/^(?=[^\.])[ <-- HERE (?: +(?!\/).)*$/ at c:/Strawberry/perl/vendor/lib/Text/Glob.pm line 18. DB<86> x $re_str = glob_to_regex_string( '[*' ) 0 '(?=[^\\.])[(?:(?!\\/).)*' DB<87>

Point is that fileglob is automatically using the unpaired [ as a literal character, while Text::Glob is entering it as unescaped regex-meta which is causing a syntax error.

I'm not sure how to handle this best, an eval { glob_to_regex('[*') } will catch this particular input, but maybe not other broken code...

Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery

Replies are listed 'Best First'.
Re: Edge case in Text::Glob
by salva (Canon) on Mar 01, 2022 at 16:19 UTC
    IMO, Text::Glob::glob_to_regex code is not too difficult to understand. If you had found an issue there you can try fixing it and submitting a patch.

    Also, I don't quite remember the details, but a long time ago I found out that Text::Glob was not good enough for my purposes and so I wrote my own version (Net::SFTP::Foreign::Helpers::_glob_to_regex).

    Retrospectively, what I learned from there is that rolling out your own version of a glob-to-regex compiler supporting basic wildcards is not too difficult. But then, last year, I had to write a feature rich glob-to-regex compiler (in Scala this time, scala-glob), and it got quite complex.

    IIRC, one of the main limitations of Text::Glob is that you can not use it efficiently to match globs spanning more than a directory level (as in f*oo/b*ar/*.txt) against tree data structures like file systems.

      > If you had found an issue there you can try fixing it and submitting a patch.

      well sure, but you know the song "don't reinvent the wheel" and I doubt I'm the first one using it.

      I also had to realize that there are many "glob" dialects around, so I first had to analyze what Perl's file glob is doing.

      To my surprise (and horror) I noticed that <KSR*> will match case insensitive on Win (eg Ksr_1 ) while being case sensitive on Linux. That's a portability issue IMHO.

      And globbing in the file explorer won't have any idea of character classes.

      Hence I don't even have a clear picture what the requirements should be for a reimplementation, or if Text::Glob is even patchable.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

        It's explicitly noted (a couple levels deep, granted) that glob without extra flags is implicitly case insensitive for VMS and Win32.

        The cake is a lie.
        The cake is a lie.
        The cake is a lie.