grasbueschel has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks,

I'm trying to speed up my little perl script by finding a way to avoid eval() or use it more efficiently. What the script does, is to process csv files and fill out two fields at the end of each line using regular expressions. There're a lot of these expressions (currently ~200) and they should be editable outside of the perl code. In the end, some fields of these csv's will be part of an SQL INSERT INTO statement.

So what I have is this:

# $filter holds regex statements # e.g. s/(F01-861385.*);;/\1;Categroy1;SubCategory2/g open(REGEX, "<$filter"); while(<REGEX>) { next if ($_ =~ m/^\s*#.*$/); chomp; $regex .= "$_, "; } close(REGEX); # pre formatted csv content in $lines for (split /^/, $lines) { chomp; $_ =~ eval($regex); # ... # extract fields and build up SQL INSERT INTO }

This is slow.... :)

Now I'm trying to find a way to improve things: Is there a way to compile the eval only once? Or do I need more out-of-the-box thinking and there's the possibility to achieve the same goal (having users setting up their own regex and include them dynamically in the code) more efficiently?

Replies are listed 'Best First'.
Re: Avoid eval() / dynamic regular expressions
by BrowserUk (Patriarch) on Dec 15, 2009 at 02:20 UTC

    Whilst you'll get some gains from creating a sub that processes one line at a time, you'll have to pay for (some of) it with the overhead of calling a sub for every line.

    Since you're already dynamically generating a sub, why not just wrap up the entire processing including the loop into the sub and just call it once:

    open(REGEX, "<$filter"); while(<REGEX>) { next if ($_ =~ m/^\s*#.*$/); chomp; $regex .= "$_, "; } close(REGEX); my $code = eval <<EOC sub { for ( \@_ ) { chomp; $regex; ## Note: using $_ implicitely is faster ## extract fields and do sql INSERT ... } } EOC $code->( split /^/, $lines );

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Avoid eval() / dynamic regular expressions
by ikegami (Patriarch) on Dec 15, 2009 at 01:06 UTC

    Is there a way to compile the eval only once?

    $regex contains something of the form s/.../.../?

    You can't have a reference to an op, but you can have a reference to a sub...

    my $op = 's/a/A/g'; my $sub = eval "sub { $op }"; $_ = "abbracadabra"; $sub->(); print "$_\n";
    AbbrAcAdAbrA
      This was exactly what I was looking for. Thanks !
Re: Avoid eval() / dynamic regular expressions
by crashtest (Curate) on Dec 15, 2009 at 01:49 UTC

    Just chiming in to quickly point out that eval'ing user-supplied input is, of course, a security risk. Even with regular expressions. Of course, you know your users and the level of trust you place in them. If your users are "the web", however, things could get hairy:

    # using the 'eval' feature of a substitution... $regex = "s/foo/system 'rm -rf /'/eg"; # ... or even... $regex = "m/(?{ system 'rm -rf /' })/";

    If you're coming from the web, you should be running under taint mode anyway, and Perl will stop you before you hurt yourself.

    As far as further optimizations, I think it would really depend on what your requirements are. If users are supposed to be able to supply any Perl regex they'd like, [id://ikegami] has given you a nice way to isolate the eval() and run it only once.

      Well, the users will run this on their own workstation, so it's their choice which statements they place into the file :)

      But thanks for pointing out!

Re: Avoid eval() / dynamic regular expressions
by Logicus (Initiate) on Aug 19, 2011 at 13:34 UTC