in reply to Re: Externally managed threads using embedded Perl
in thread Externally managed threads using embedded Perl

OK, thanks for that. I am happy to run with Perl 5.8 or later as a requirement.

So it seems I do this by having a single interpreter that I perl_clone(my_perl, CLONEf_CLONE_HOST) for each thread, throwing away the clone when the thread ends and/or pulling from a pool of pre-existing clones?

A good analogy for what this is doing is a basic web server (note: this is not a web server - I'm not that stupid). ie: Short-lived requests serviced over a network connection. Is perl_cloning going to be fast or do I need to do some sort of thread pooling (which I may do anyway) to get it to perform? (or should I stop being so damn lazy and just test the peformance myself ;-))

Or, do I want to learn Perl and try to make it all happen with Perl threads (which will involve learning Perl - I didn't write the script we want to execute)?

Thanks again,

Phil

  • Comment on Re: Re: Externally managed threads using embedded Perl

Replies are listed 'Best First'.
Re: Re: Re: Externally managed threads using embedded Perl
by BrowserUk (Patriarch) on Jan 08, 2004 at 04:39 UTC

    Reading between the lines of what you've told us, you have a pre-written perl script that you want to be able to run on behalf of networked users on a single machine, with concurrent access, but no sharing of data between the instances? And you aren't a perl programer :)

    It really will depend on how the pre-existing perl script runs, but assuming that the script returns the results via stdout?

    If this is the case, cloning an interpreter for each request, or building a pool of clones would probably work ok. I haven't done enough with it embedding -- nothing beyond the simple examples in perlembed -- to be able to predict the performance. Pre-cloning a pool and returning a "busy...try again" message if the pool is fully utilised, ought to be fast enough, if the loading isn't too extreme.

    Personally, I would probably use a thread-pool design using threads or maybe a pre-forking design written in perl using perl's win32 pseudo-fork support, as I find perl so much more productive that C/C++.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    Hooray!

      Yep, that's pretty much it.

      The existing script is SpamAssassin's spamd. It is designed to run on *nix and uses fork and signals and other stuff that don't work too well on Win32. It reads and writes from/to a socket.

      I find C++ much more productive that Perl - I guess it's just a question of what you know <g>.

      I'll do some work with cloning and see what happens.

      BTW, if I:
      my $spamtest = SpamAssassin::Mail::SpamAssassin->new(...) : shared;
      will $spamtest be available in the clone if I perl_clone after this Perl code is executed?

      You've been most helpful. Thanks.

        I find C++ much more productive that Perl - I guess it's just a question of what you know <g>.

        Well, I know (or perhaps, knew once) C++ also, and I find Perl infinitely more productive. But then again, I hated C++ from my first encounter. Time and experience only made things worse -- but I know I am in a minority in that view:)

        BTW, if I:
        my $spamtest = SpamAssassin::Mail::SpamAssassin->new(...) : shared;
        will $spamtest be available in the clone if I perl_clone after this Perl code is executed?

        I've never seen anyone use that syntax before, and a quick test seems to indicate that you cannot use :share in that way.

        use threads; use threads::shared; use Benchmark::Timer; $T = Benchmark::Timer->new() : shared; Unquoted string "shared" may clash with future reserved word at (eval +3) line 1, <> line 2. syntax error at (eval 3) line 1, near ") :"

        Any attempt to share a blessed reference usually results in an error.

        use threads; use threads::shared; use Benchmark::Timer; { my $T : shared; $T = Benchmark::Timer->new(); } Invalid value for shared scalar at (eval 5) line 1, <> line 5.

        So, I think the short answer is no.

        I assume you were hoping to share a single instance of spamassasin between the multiple interpreters. This will not work.

        Creating a unique instance in each interpreter would(may) probably work, but I have no idea how much memory that would consume.

        That would require a piece of suck-and-see engineering I am afraid. Unless anyone else knows better.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
        Hooray!

      I'll second the suggestion to drop C/C++ here unless you have a compelling reason.

      You may also want to drop the entire service wrapper, unless you specifically need this program to be controlled like a Windows service. If you just need it to start when the machine does, and you're using Windows 2000 or later, you can just make it a scheduled task that runs "on system startup". The program you launch at that point can use pthreads, pseudo-fork, or whatever else you'd like. Depending on load, it might be interesting to look at the inetd available in Cygwin also.

      --
      Spring: Forces, Coiled Again!

        Your suggestion is noted. I'm afraid my Perl skills aren't up to the task. Perhaps if one of the Perl Monks wanted to do something wonderful for the anti-SPAM community (or at least the anti-SPAM Windows community looking at, or already using, SpamAssassin) you might like to head over to SpamAssassin.org and make spamd run under Windows as well as the platforms it currently works with.

        I think for this stuff to be accepted by the general Windows community it needs to be a service. That also gives it some visibility and increased manageability. I know about the scheduled task thing but in this instance, when we're talking about a (ideally) 24x7 operation, you need a bit more infrastructure to help manage the process.

        wrt cygwin, this is the current solution for running spamd (SpamAssassin daemon) on Windows. Even my current experimentation is showing that the memory consumption will be way down doing it my way without cygwin, even if I can't clone interpreters. I think I read somewhere around here that Perl under cygwin is about half the speed of native Win32 Perl.

        Finally, I also think that a native Windows Service, obviously designed for Windows, will have more credibility (maybe the wrong word) than, "Oh, just install this cygwin emulation environment and do this and that and it'll just work after a day or so of mucking around". This is the current proposition and although some people have done it I think there are a lot more who won't. I'm one of those - I'd rather run a dedicated Linux box than try to emulate Linux on Windows (and that's exactly what we're doing at the moment).

        Feel free to tell me I'm wrong ;-P (I probably am).

        Phil

      I'm back having done a lot of experimental work. I have more questions that would like to be answered. First, some code:
      EXTERN_C void xs_init (pTHX); using namespace std; CPerlEngine::CPerlEngine(char* pScriptFile) : mInterpreter(NULL), mScriptFile(pScriptFile) { this->mInterpreter = ::perl_alloc(); assert (this->mInterpreter != NULL); PERL_SET_CONTEXT(this->mInterpreter); ::perl_construct(this->mInterpreter); char* theArguments[] = {"-x", "-S", "-s", pScriptFile}; ::perl_parse(this->mInterpreter, &xs_init, 4, theArguments, NULL); PL_exit_flags |= PERL_EXIT_DESTRUCT_END; ::perl_run(this->mInterpreter); } CPerlEngine::~CPerlEngine(void) { ::perl_destruct(this->mInterpreter); ::perl_free(this->mInterpreter); } void CPerlEngine::invoke(const char* pFunctionName, vector<string> pParameters) { assert (NULL != this->mInterpreter); PERL_SET_CONTEXT(this->mInterpreter); // Pick up all the stack info in *this* threads local storage dTHX; #ifdef PERL_CLONE_WORKS PerlInterpreter* newInterpreter = ::perl_clone(this->mInterpreter, CLONEf_COPY_STACKS | CLONEf_KEEP_PTR_TABLE | CLONEf_CLONE_HOST); // PerlInterpreter* newInterpreter = ::perl_clone(this->mInterpreter, + CLONEf_CLONE_HOST); // PerlInterpreter* newInterpreter = ::perl_clone(this->mInterpreter, + NULL); #else PerlInterpreter* newInterpreter = this->mInterpreter; #endif assert (NULL != newInterpreter); ::perl_run(newInterpreter); dSP; ENTER; SAVETMPS; PUSHMARK(SP); for (vector<string>::iterator theIterator = pParameters.begin(); theIterator != pParameters.end(); theIterator++) { if (theIterator->length() > 0) { XPUSHs(::newSVpv(theIterator->c_str(), theIterator->length())); } } PUTBACK; ::call_pv(pFunctionName, G_DISCARD); FREETMPS; LEAVE; #ifdef PERL_CLONE_WORKS ::perl_free(newInterpreter); #endif }

      This is my C++ class wrapping the Perl interpreter(s). It's pretty simple. You instantiate it, it loads the specified script, and runs all of the global bits (please excuse any incorrect terminology). So far so good. The global bits are essentially a whole heap of "use blah" type statements. I think there's value in this because it results in a Perl interpreter with script loaded and references loaded - ready to be cloned and executed at will.

      The idea, then, is to call invoke(...) passing the name of the Perl sub to call and some arbitrary number of string elements. This is all pretty cool and seems to work in multiple threads with a basic Perl script.

      If I don't define PERL_CLONE_WORKS, the whole thing works like a bought one (actually better than many) but only in a single thread (obviously).

      Now imagine I define PERL_CLONE_WORKS, instantiate CPerlEngine in one thread, and call invoke(...) on a separate thread. The new thread comes along, clones the existing interpreter, and then calls my sub. This all works perfectly with a basic script BUT with a more complex script (my cut-down spamd) I get a runtime crash out of the Perl Engine in VMem::Free(void* pMem) where it says

      Perl_warn(aTHX_ "Free to wrong pool %p not %p",this,ptr->owner);
      Note that this is not the global cleanup error and it's happening a long time before the end of the script and a reasonable distance into the script.

      I think you (BrowserUk) have seen this before in different circumstances.

      My fallback plan is to create a whole new PerlInterpreter using ::perl_alloc and load absolutely everything from scratch each or have my pool of interpreters (as previously discussed).

      The value in being able to clone on the fly like this is that it won't need to have 50 interpreters lying around consuming vast amounts of memory waiting for something to come in.

      Now to my questions:

      1. Is my class doing everything it should do? If not, what else should it be doing?
      2. I've tried all combination of flags on ::perl_clone and none work. For future reference, what flags should I be using?
      3. What is
        Free to wrong pool
        telling me?

      We're getting closer at least. Another couple of weeks of this sort of questioning and I might get there <g>.

      Phil

        I'm sorry, but I am going to have to chicken out here. You are already way beyond anywhere I have ever been or am likely to go:)

        You may get lucky and have one of the internals guys pick up on this here, but I would suggest that you try the perl5porters list or comp.lang.perl.something, perhaps misc. Your more likely to encounter people who have been-there-&-done-that there, than you will here.

        Maybe merlyn or Abigail-II or one of the other regulars here that also frequent those other haunts can recommend the best place for perl-embedding questions?

        About the only one of your questions that I can even attempt an answer at is the "free to wrong pool". Simplistically, this means that something was created by one thread and came up for deletion from another.

        This isn't necessarially a programmer error in the sense of the perl code being interpreted containing an error, but often a consequence of a sequence of perl-level program flow that causes an internal inconsistancy that hasn't yet been uncovered. Ithreads are relatively new, and not all the bugs have popped their heads up yet.

        That you are doing things rarely done -- embedding -- means you are probably pioneering the use of ithreads in an embedded environment, and are likely to be the finder of such inconsistancies.

        About the best advice I can offer is that you go for extreme safety in the first instance. Slow, big and working is better than dying really fast:)

        Once you get something working, you can then try slimming it down and speeding it up in small increments, keeping those that work, and baypassing those that don't.

        That said, I think I would seriously look at using perl's internal pseudo-fork to do what you are trying to do.

        Instantiate a single interpreter, interprete a script that uses everything you need, establishes the connection to the server, and then forks as many copies as you want in your pool.

        This will give you the many clones of the environment you set up, each independant from the other, and running in it's own kernel thread within the same process. It would then be up to your C++ code to interface to the clones (I don't knw if this is possible!) and manage them. **THIS IS HIGHLY SPECULATIVE**.

        Good luck:)


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
        Timing (and a little luck) are everything!