in reply to Regex partial/leading match

Hi raymorris, welcome to the Monastery, and Happy New Year!

I could be wrong here, but this looks like an XY Problem, and I suspect you may be looking at the issue backwards. Could you show us some code so we can see what you're trying to do in a real case?

Cheers,

-stevieb

Replies are listed 'Best First'.
Re^2: Regex partial/leading match
by raymorris (Novice) on Dec 31, 2015 at 21:54 UTC
    It is possible it's an XY problem, which is why I gave the example usage. From an external source, we have a large number of regular expressions. Two examples are:
    /^C:\\Windows\\Program Files\\blah[0-9]\\setup.exe/ /^C:\\Windows(\\SYSTEM)?\\Foo\\Bar.dll/
    The second example is probably most interesting. It matches either C:\\Windows\Foo\Bar.dll or C:\Windows\SYSTEM\Foo\Bar.dll.

    The regular expressions are from an external source, so we can't change the fact that we get them in that format. We must recurse through a drive to find files matching the expression.

    Suppose we come across the directory C:\Temp\ . Intuitively, we know we don't need to recurse into C:\Temp\ because nothing in that directory can match either regex. Because we don't have a leading match, we should return false immediately. On the other hand, when we come to C:\Windows\, we SHOULD recurse, because it matches the leading part of the regex and we may find a full match if we keep going. This is obvious to the human, the trick is how to tell Perl to skip anything that can't match (even as more characters are added to the END of the string).

    I'm trying to think of any way to go "up another level", to see the problem from a higher view, but there really isn't anything I can think of. The regular expressions are externally supplied and we must find files on a drive which match them. For efficiency, we wish to avoid looking for files in directories that can't possibly match. The regex engine does this internally, I believe, but I don't know if it exposes the "matched length" of an unmatched regex to Perl.

    PS - I wish I could still log into my account from 2003. :(

      Part of your problem here, is that you have more information than you're giving the computer. i.e. they're not just regex but file system path expressions.

      You might take advantage of that knowledge and decompose the regex into a set of File::Find::Rule rules, obviously you'll have to write the parser youself, but hopefully all the regexes will be quite similar and there will be common patterns that you can spot and translate into rules.

      So you might end up with something like :-

      my @dirs = File::Find::Rule->directory()->name(qr/blah[0-9]/)->in(' +C:\\Windows\\Program Files'); my @files = File::Find::Rule->file()->name('setup.exe')->in(@dirs);

      It's an interesting problem and well worth spending some time on.