erix has asked for the wisdom of the Perl Monks concerning the following question:

Dear fellow Monks,

May I once more lay before you the question of communication between a Perl program and external regexfiles. My objective is to find a mechanism, with as loose a coupling as possible, that makes it possible for a Perl program to 'query' a regex for its captured values.

The regex can have a prefix codeblock that inits some local variables, that can be populated with the captured values of the regex, and then, in a postfixed codeblock, stored in a hash. This results hash must then be made accessible to the container program.

I have this working, but only by using a hash variable that the container must know about and manipulate. And although I think it can be put to good use as is, I am looking for a way where the container program needs no prior knowledge, and just 'queries', somehow. Maybe there are perl globals that I overlooked?

Here is an example:

(?{ # prefix codeblock: local $number = ""; local $name = ""; local $rest = ""; }) ^ [\ \t]* (?: ([0-9]+)\.[ ] (?{ $number = $^N; }) ) (?: ([A-Z][A-Z]+)[ ]* (?{ $name = ucfirst(lc($^N)); }) ) (?: (.*) (?{ $rest = $^N; }) ) $ (?{ # postfix codeblock: ${$hashname}{ 'number' } = $number; ${$hashname}{ 'name' } = $name; ${$hashname}{ 'rest' } = $rest; })

The variable $hashname is translated by the container program to the proper variable name, with namespace. This is simply search and replace to prevent real hardcoding in the regexfile.

Any comment on the problem and my tentative solution is very welcome. Is there a better way?

update layout and typos, shortened a little

Replies are listed 'Best First'.
Re: Communication of program(s) with regex codeblocks
by ikegami (Patriarch) on Oct 26, 2004 at 14:54 UTC
    What would be an example of an ideal container?

      The ideal container program would slurp the regexfile, then apply it to whatever text that is to be searched. In the case of a match, it would somehow (and this is my problem) be able to retrieve the captured values (in the proper order) that are in the regex. This does work in the above regex example, but that is ugly because it needs the program to know what to do with '$hashname'.

      My example regex above has this silly ${$hashname} stuff, which (in the test container program I have) is changed into a real variable name for the hash. I have copied the sub below. I've added some comments, variables contain what they are named after.

      And it may well be that it is just not feasible without taking recourse to globals.

      This sub is called before compiling the regexes. The substitution is its main function.

      sub get_regexes_prepare { # replaces all ${$hashname}{'abc'} no strict 'refs'; my ($pckg,$rregexes) = @_; # packagename and hashref are passed my @regexes = @{$rregexes}; my @hashnames = (); $#hashnames = $#regexes; # same size for (my $i=0; $i < $#regexes+1; $i++) { my $hashname = "${pckg}::hashname".$i; # construct hashname $hashnames[$i] = $hashname; $regexes[$i] =~ s/\$hashname/\$$hashname/g; tie %${$hashname}, "Tie::IxHash"; # keep order } return (\@regexes,\@hashnames); }

      I'll later (tomorrow or so) post more complete code. But as you can probably guess from this sub code, it needs some cleaning up and removing of some experimental stuff :)

      Thanks

        The ideal container program would slurp the regexfile, then apply it to whatever text that is to be searched. In the case of a match, it would somehow be able to retrieve the captured values in the proper order.

        Doesn't the following do just that?

        my $re = do { local *FILE; open(FILE, '<', $regexp_file_name) or die('...'); local $/; qr/@{[<FILE>]}/ # Compile only once. }; while (<DATA>) { if (@captures = $_ =~ $re) { print(join(', ', @captures), $/); } } __DATA__ abd 123 sdafas 231 gdabd 7364 112 sdafas 785 regexp file (Matches lines with two words of exactly 3 digits.) =========== \b(\d{3})\b.*?\b(\d{3})\b output ====== 123, 231 112, 785