tod222 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I've written an application which implements a domain specific language (DSL). To keep the implementation simple, the DSL allows the use of Perl snippets which are fed to eval.

The Perl snippets can reference scalars (and in the future, arrays) which are declared in the DSL outside the snippet. (Note that the names of variables defined in the DSL may not begin with an underscore.)

I've come up with three different implementations.

Each variation is implemented as a subroutine (or method) that takes three arguments:
$_codetext
A scalar containing a string with the text of the Perl snippet.
$_valhref
A reference to a hash containing key/value pairs which must be accessible by the snippet. Keys are converted to the names of the scalars to which are assigned the values. Note: After the snippet has been evaled the hash must be updated with any modifications to the scalars.
$_taskname
A scalar containing an identifying string for use in error messages.
A scalar called $FAIL_MATCH is always predefined and the subroutine returns this value.

I'm looking for comments and feedback on which is best, or suggestions for better alternatives. Note that there are really two separate issues here -- the main one of the best method for running the snippet, and the secondary one of the best method for converting the hash entries to scalars and back.

The Implementations

1. Quick and dirty

This proof-of-concept implementation makes use of Alias.pm and is implemented as a method to another object, thus the use of $self.

The 'no warnings' and 'no strict' statements are needed to hide error messages that are a side effect of the way Alias.pm is implemented.

Advantages: Easy to implement.

Disadvantages:

  1. The 'no warnings' and 'no strict' hide real errors triggered in the snippet in the case of a typo or other error, making this implementation unusable for my purposes.
  2. The snippet has full access to other variables defined in the package containing the eval.

The code:

sub _onmatch_fail { my ( $self, $_codetext, $_valhref, $_taskname ) = @_; my $FAIL_MATCH = 0; # m +atch defaults to succeed my $_codetext = "# line 1 task[$_taskname]\n" . $_codetext; { use Alias qw(attr); attr $_valhref; no Alias; no warnings qw(once); no strict qw(vars); eval $_codetext; die $@ if $@; return $FAIL_MATCH; } }

2. Prolog/epilog wrapper in separate package

This implementation uses no modules, but as a result must surround the snippet with a prolog and epilog to perform the conversion between the hash entries and lexical scalars.

Advantages:

  1. Better isolation of the snippet through the use of a separate package.
  2. Good reporting of errors in the snippet.
  3. No modules required.

Disadvantages:

  1. The prolog/epilog stuff seems like a kludge.
  2. While isolation is better, mischief by a knowledgeable user is still possible.

The code:

package App::Tasker::Exec; sub _onmatch_fail { my ( $_codetext, $_valhref, $_taskname ) = @_; @_ = (); { my $FAIL_MATCH = 0; # match defaults to succeed my $evaltext = ''; my $epilog = ''; my $_newval = {}; while ( my ( $k, $v ) = each %$_valhref ) { $evaltext .= <<EOT; my \$$k = '$v'; EOT $epilog .= <<EOT; \$_newval->{'$k'} = \$$k; EOT } $evaltext .= "# line 1 task[$_taskname]\n" . $_codetext; $evaltext .= $epilog; eval $evaltext; die $@ if $@; for my $k ( keys %$_valhref ) { $_valhref->{$k} = $_newval->{$k}; } return $FAIL_MATCH; } }

3. Separate package using Safe.pm

This is a variant of the previous implementation that makes use of Safe.pm.

Note that the use of a formerly undocumented argument to Safe.pm's reval method is needed to prevent it from using 'no strict'.

Advantages:

  1. Uses Safe.pm to execute code in a restricted compartment.
  2. Good reporting of errors in the snippet.

Disadvantages

  1. Uses Safe.pm to execute code in a restricted compartment. This will give novices a false sense of security while possibly tripping up knowledgeable users trying to do legitimately complex things, depending on how restricted the compartment is at the default setting.
  2. Retains the prolog/epilog kludge from implementation 2, which gets more complex due to Safe.pm.

The code:

package App::Tasker::SafeExec; sub _onmatch_fail { my ( $_codetext, $_valhref, $_taskname ) = @_; $_codetext .= "# line 1 task[$_taskname]\n" . $_codetext; { use Safe; my $_compartment = new Safe; our $FAIL_MATCH = 0; # match defaults to succeed my $_epilog = ''; my $_newval = {}; my $_evaltext = <<'EOT'; $_compartment->share('$FAIL_MATCH'); EOT while ( my ( $k, $v ) = each %$_valhref ) { $_evaltext .= <<EOT; our \$$k = '$v'; \$_compartment->share('\$$k'); EOT $_epilog .= <<EOT; \$_newval->{'$k'} = \$$k; EOT } $_evaltext .= <<'EOT'; $_compartment->reval($_codetext, 1); die $@ if $@; EOT $_evaltext .= $_epilog; eval $_evaltext; die $@ if $@; for my $k ( keys %$_valhref ) { $_valhref->{$k} = $_newval->{$k}; } return $FAIL_MATCH; } }

Replies are listed 'Best First'.
Re: Best of three methods for evaling Perl snippets?
by kyle (Abbot) on Sep 04, 2008 at 02:57 UTC

    The two issues you point to before all the code are (1) the best way to run the code, and (2) the best way to get your data in and out of it. However, the code itself suggests that you're also worried about some semblance of isolation. What's the point of that? If the code you're running is not trusted (i.e., you want to prevent access to system resources), that's going to be hard. If you just want to avoid someone accidentally stomping on the rest of your program, that's something else.

    I'd use some kind of prolog/epilog wrapper to get data in and out. I think this is fraught with peril, however:

    $evaltext .= <<EOT; my \$$k = '$v'; EOT

    Even if you're not worried about malicious code, this would be pretty easy to trip up (if $v contains a single quote, for instance). I think "$v =~ s{(\\|\')}{\\$1}g" would be good enough protection (for $v but not for $k), but I'm not sure how much I'd stake on that. It might be safer to use Data::Dumper to serialize each $v and just put some strict limits on a pattern that $k must match.

    If you just want some encapsulation to keep the eval from meddling with code it has no business with, I'd recommend a fork into another process. The child won't be able to muck with the parent's data (but beware of open filehandles and sockets and such). If I were writing this, I'd use open with the '-|' mode as in Re^4: Forking problem UPDATED. The child would write out some serialization of the resulting variables, and the parent would read them and make the changes to its local data.

    Just to reiterate, if you're trying to avoid some process getting to local files or something, I don't have any suggestion.

      Thanks for pointing out the possibility of a single quote in $v -- I hadn't thought of that, because most (but not all) of these scalar and array values $v will be limited to a restricted set of characters. All $k are already restricted by the grammar to a limited set of characters allowed in identifiers, so $k itself isn't an issue.

      The goal is not to foil malicious code, merely to prevent people from shooting themselves in the foot, unless they really intend to. An errant single quote is the exact type of error I'd like to handle well.

      Thanks.

      Update: If Alias.pm didn't throw error messages during normal operation, using it in a separate module would suffice (a combination of implementations 1 and 2).