wimpie has asked for the wisdom of the Perl Monks concerning the following question:

I've created a perl script automatically based on an ini file (we want to replace the ini file holding a number of rules by regular expressions in a perl script). The ini file is rather long, and the generated perl code is 22000 (approximate) lines long, holding about 8370 regex's. On my 1 GHz Pentium processor with 256 MB ram, it takes about 90 seconds to parse this file...(it is called from within a C++ program, but once it is parsed, the function call itself takes no time whatsoever) 90 seconds is (much much much) too long. Is there any possibility to save a parsed file (it never changes) in some format so that when it is needed (and loaded at program startup),it should not be parsed anymore, and thus load faster? Thx already, Wim

Replies are listed 'Best First'.
Re: Saving parsed perl for faster loading?
by Abigail-II (Bishop) on Dec 01, 2003 at 14:57 UTC
    There's no official way of doing this. There have been many attempts though. But since this opens the door to "binary distribution of source code", there has never been an enormous movement in getting this working. Ilya might have it working under OS/2 though.

    But nowadays, there are hooks that can be called after compilation, and you can access the bytecode, so it might be possible, although there may be dragons to be slayed before having a stable way of doing so.

    So, yes, it may be possible, but there's no off-the-shelf solution that I am aware of.

    Abigail

      I think it's called ByteLoader (B::Bytecode).
        Looks like a good solution for me, so I've tried it out (on a Windows machine) with a simple code file (holding : print "hello compiled world";) and the command C:\>perl -MO=Bytecode compiled.pl > result.pl. This result in a series of errors : 1: no such instruction "" ... 110: no such instruction "" There were 110 assembly errors Excuse me if I'm wrong, but the documentation about ByteLoader is rather skinny. Is there a good source for information about this way of working out there? All help is welcome
Re: Saving parsed perl for faster loading?
by Corion (Patriarch) on Dec 01, 2003 at 15:08 UTC

    Are you sure that your numbers are correct and the Perl parser is really the bottleneck? I created the following, very simple script that maybe reproduces your situation in a (completely Perl) environment, and it takes 10 seconds (total) to complete, spending about 4 seconds parsing the created file:

    #!/usr/bin/perl -w use strict; use File::Temp qw(:mktemp); my $REs = 9000; my ($filename) = mktemp('tmpfileXXXXX'); my $lines = 0; my $template = q{ sub is_%s { my $re = qr(^%s(\\1)$); return shift =~ $re; }; }; print "Generating $REs regular expressions in $filename\n"; open FH, ">", $filename or die "Couldn't create $filename: $!"; for my $i (1..$REs) { my $name = "re_${i}_"; my $code = sprintf $template, $name,$name; $lines += () = ($code =~ /\n/msg); printf FH $code or die "Couldn't write template to $filename: $!"; }; close FH; my $start = time(); system($^X,"-w",$filename) == 0 or die "Couldn't spawn created file $filename: $!/$?"; my $stop = time(); my $duration = $stop-$start; print "It took me $duration seconds to parse $REs regular expressions +($lines lines)\n";

    Now, my regular expressions are quite simple and the rest is even simpler, but my machine is a P-II 400 with 256MB RAM, so it should be even slower than your program ...

    Maybe you can optimize/simplify the generated Perl code? Maybe the data structures are inefficient? Maybe the file I/O /network I/O is the bottleneck?

    perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web
      I've had similar problems on my 1.something GHz machine, its entirely plausible. In that case it was a really large chunk of data from Data::Dumper. I switched to Storable and that was much faster.
        Storable was what I was thinking of too when I read the question. I use it for most configuration style things now, with auto-update if the configuration file changes. In some cases it has only a marginal benefit, but for 8000+ regexes it might be worth doing.

        --
        Barbie | Birmingham Perl Mongers | http://birmingham.pm.org/

Re: Saving parsed perl for faster loading?
by tadman (Prior) on Dec 01, 2003 at 15:08 UTC
    In some instances you can use perlcc to create a compiled version of your program, although, according to the documentation, "Use for production purposes is strongly discouraged." Experimental code and all that. It's worth a try, if you're really desperate, since it ends up being compiled code.

    If this program is used on a semi-regular basis, you could always leave it running as a "daemon" process, allowing other programs to call it as required using a socket. This isn't as risky, but you should have a good example of how to make a daemon process handy, for example, from the Perl Cookbook. A cheap way of doing this is to turn it into an Apache module, where it can be accessed using HTTP.
      If this program is used on a semi-regular basis, you could always leave it running as a "daemon" process, allowing other programs to call it as required using a socket. This isn't as risky, but you should have a good example of how to make a daemon process handy, for example, from the Perl Cookbook. A cheap way of doing this is to turn it into an Apache module, where it can be accessed using HTTP.
      A simpler way would be to use pperl, short for "persistent perl", which, to my understanding, pretty much works in the way you describe.

      As for making it work on Windows: the only report of a successful build on Windows, is using Cygwin — only, even that is for a very old version: 0.02. Build status reports for ActivePerl (5.6, 5.8) don't look good at all.

Re: Saving parsed perl for faster loading?
by cyocum (Curate) on Dec 01, 2003 at 15:43 UTC

    I should have looked before I typed. This will probably not help with your startup problem but might be a good suggestion for once you have fixed it and are looking for possible optimizations. Sorry for the bit of off-topic.

    Is your generated code looping over a constructed regex? This can cause the regex to be reparsed over and over again in the regex engine. You may want to look at the o option so that it is only compiled once in the regex engine.

      o option does not really do anything to speed up most regex, if you are using the same static regex over and over try qr// instead. That actually compiles the regex and can produce speedups. As you said however this does not answer the OP's question. =)


      -Waswas

      /o is dead, long live qr//!

      ----
      I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
      -- Schemer

      : () { :|:& };:

      Note: All code is untested, unless otherwise stated