Beatnik has asked for the wisdom of the Perl Monks concerning the following question:

Hey,
I'm parsing some data files with function-like tags. These tags have parameters, which I need to pass along to an Perl function. I'm currently using the snippet listed below but I feel like it's kinda dirty and slow (altho it works fine). Are there any faster ways I can do this?

The Parameters would be something like <?Function("foo","bar")?>
while ($content =~ m/<\?Function\(\".*?\"\)\?>/g) { $_ = $content; my ($line) = /<\?Function\(\"(.*?)\"\)\?>/; my (@Params) = split(/\"\,\"/,$line); $line = some_param_stuff(@Params); $content =~ s/<\?Function\(\".*?\"\)\?>/$line/; }

Greetz
Beatnik
... Quidquid perl dictum sit, altum viditur.

Replies are listed 'Best First'.
Re: String parameter parsing
by chipmunk (Parson) on Jan 19, 2001 at 02:38 UTC
    Well, you're using the same regular expression three times, which is a good sign that you're making something harder than it needs to be. In particular, you don't need to remove the matched string at the end of the loop, because m//g will find the next occurence anyway.
    while ($content =~ m/<\?Function\(([^\)]*)\)\?>/g) { my $params = $1; my @params = $params =~ m/"([^"]*)"/g; my $line = some_param_stuff(@params); }
    (tested)

    The regex I use to match the individual parameters assumes that there will be no escaped quotes within a parameter. (e.g. "my \"tricky\" parameter".)

      I don't want the <?Function("foo","bar")?> in $content after the processing. It should be replaced by the output from some_param_stuff(@params);. So I think I need s///. I don't really check for escaped \" either (altho I'm sure MRE mentions it somewhere).

      Greetz
      Beatnik
      ... Quidquid perl dictum sit, altum viditur.
        Ah, sorry, I overlooked that bit. In that case, you could just put the substitution back in at the end of my loop.

        Alternatively, the whole thing could be rewritten as a single substitution, with the /e modifier so the replacement will be eval'ed:

        $content =~ s/<\?Function\(([^\)]*)\)\?>/ my $params = $1; my @params = $params =~ m,"([^"]*)",g; some_param_stuff(@params); /ge;
Re: String parameter parsing
by tadman (Prior) on Jan 19, 2001 at 03:58 UTC
    This is similar to what embperl, eperl and other "embedded Perl" meta-languages do. They might be a better choice than rolling your own, so it is certainly advised to check into them first.

    Back to the code you posted. I'm not sure that it will work outside of your carefully constructed and manicured test environment. I would suggest taking a "hands-off" approach to the whole parsing thing and let Perl do that for you.

    If I understand your requirements correctly, you are saying that you have an arbitrary function "f()" which you want to use in your code by inserting something along the lines of "<?f(...)?>" This text would be replaced with the result of that function call.

    In your code you make reference to something called "Function()" but the actual Perl code refers to "some_param_stuff()" which is likely to be very confusing if you add more than one function. Keeping them the same would help improve readability substantially.

    If you have control over the input data from the file, meaning that no "unauthorized" users will be able to insert potentially malicious code, you can implement this very quickly using a small amount of code. The key is getting the regexp right, and although confidence is high in the utility of the one below, I am certain it could be improved.
    sub Reformat { local ($_) = shift; my ($out) = ''; while (/<\?([A-Za-z_][A-Za-z0-9_]*\s*\( (?: (?: (?:"(\\"|[^"])*?") | (?:'(\\'|[^'])*?') | \)(?!\s*\?\>) | (?:[^'"\)]+) )*? ) \))\s*\?>/sx) { $out .= $`; $_ = $'; $out .= eval $1; } $out .= $_; return $out; }
    You'll note that this passes any code you put into your document into the eval() directly, without any checks. Obviously you will not want just anyone putting code in these documents.

    Since this is using the Perl interpreter, you can put all sorts of stuff in the tag and it should work out fine. The regexp catches single and double quoted strings, which will prevent the parser from terminating prematurely on a quoted ")?>", for example. Support for "qq()" could be added as well, but this may be extraneous considering your application.
    Additionally, the result of the eval() may be undef, and the tag will disappear from your document without a trace. A simple modification could help you:
    $out .= eval $1; $out .= "<!-- $@ -->" if $@;
    This will, at least, put some error information in the output should something go horribly wrong during processing.

    To use this, you would define functions in your parse program that could be accessed by the script. If you wanted to limit the functions used by this to a package, at least in a casual sense, you could always do this:
    $out .= eval "MyPackage::$1";
    This would force all function calls into the desired package, however, it would not prevent something like this from appearing in your code:
    Something about <?account_name(uc("foo"))?>:
    "uc()" will be called in the main package space unless explicitly specified otherwise.

    I hope that's in line with what you were looking for.
      Actually, I'm handling the parametes as a list, and calling separate functions for each parameter. Function is not a function in my code, it is a marker to which functions to use with the parameters.

      function_one($param[0]); function_one($param[1]); and_so_on;

      As a side note, parameters and "function names" are not always alphanumeric.
      The Benchmark results from Chipmunk's code
      Benchmark: timing 10000 iterations of beatnik1, chipmunk1, chipmunk2 beatnik1: 536 wallclock secs (449.49 usr + 2.70 sys = 452.19 CPU) chipmunk1: 390 wallclock secs (360.76 usr + 2.25 sys = 363.01 CPU) chipmunk2: 313 wallclock secs (250.84 usr + 2.00 sys = 252.84 CPU)
      Again, muchos kudos to Chipmunk

      Greetz
      Beatnik
      ... Quidquid perl dictum sit, altum viditur.