Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Is it possible to use a subroutine call from inside a regex-codeblock?

This is about separating regexes from application code by storing the regexes in separate files. I have this working nicely and usefully, so far no problem.

But now I am looking to move some code out of the application, more specifically I want to call (dbi-)subroutines from within the codeblock, and pass it a regex-captured value. Here I am running into trouble...

To do database lookups straight after matching seems such a logical step, that I expect others must have progressed further in this direction. Of course I will post complete code if anyone turns out to be interested, but because it needs several files I thought better start with a small, not-running example:

What follows is such a (above-mentioned) separate regex-file.

(?{ local $genus = ''; local $dbfound = 0; }) ^ (?: ([A-Z][A-Z]+) (?{ $genus = ucfirst(lc($^N)); $dbfound = dbilookup($genus); }) ) $ (?{ # # further processing of values # (give back values to container program) })

I'd be thankful for enlightenment

TIA, Eric

Replies are listed 'Best First'.
Re: regex-codeblock calling subroutines?
by dave_the_m (Monsignor) on Oct 15, 2004 at 15:01 UTC
    Beware of calling functions that use regular expressions. The regex engine isn't fully re-entrant and strange things may happen.

    Dave.

Re: regex-codeblock calling subroutines?
by dragonchild (Archbishop) on Oct 15, 2004 at 15:14 UTC
    Why do you want to do it as part of the regex? Why don't you split it out and do it within a while loop?

    Being right, does not endow the right to be rude; politeness costs nothing.
    Being unknowing, is not the same as being stupid.
    Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
    Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

      I suppose I *could* split it out, and will do that if all else fails. But then the functionality is in that one application. I think that if the dbi call can be made to work, then flexibility of the system would greater. I could then, with a certain text that a regexfile is to matched against, control the program flow depending on the outcome of certain found/not found queries. Also, many programs could use that lookup functionaliy.
Re: regex-codeblock calling subroutines?
by perlcapt (Pilgrim) on Oct 15, 2004 at 18:26 UTC
    My understanding of match-time code evaluation is that the code-evaluation is not done in an order that is easy to predict. Regular expressions require that a regex-"engine" be built prior to the evaluation of the string. Secondly, the evaluation is not strictly and directly from left to right. This is more an event-driven code evaluation than a proceedural code evaluation. I.e., even if you can get this to run, it will be very difficult to maintain. We aren't byte/cycle miserly in Perl. There are other languages for that personality of programming.

    I like what you are trying to do, but don't think this is the design style that will yield the best design. Rebuttal is welcome.
Re: regex-codeblock calling subroutines?
by erix (Prior) on Oct 15, 2004 at 18:44 UTC

    Hmm. Point(s) taken (to both david and capt). Thanks.

    I must admit that I am not sure if it will be a good idea, or even work at all. I was hoping to find monks who stumbled along the same path, and came out victorious or beaten, but wiser.

    It seems an attractive idea to have some db-backed 'directive' when crafting a regex.

    I'll just stumble on for a bit.

    (erix is 'Anonymous Monk' - lost cookie when I posted that initial question)

      No really, don't.

      Any code you call from inside a (?{ ... }) absolutely, cannot use regular expressions at all. Unless you control every part of the code being run inside the expression, the code isn't safe. I expect that you're using DBI and a driver module to work with your database. That's all code that you don't control and that probably uses regular expressions. None of that can be called from within a (?{ ... }) block.

      Examine the pre-forking used in Regexp::Approx - Use fuzzy regular expressions to see how I fork during INIT and then run the unsafe code in a separate process. This is the only general method which allows you to do this sort of thing.

        Right, that settles it then. I did not realise the regex-inside-block aspect. I'll scurry back then, instead of stumbling on. And I will have a look at that module.

        Thanks, Eric