in reply to Re: Re: Externally managed threads using embedded Perl
in thread Externally managed threads using embedded Perl

Reading between the lines of what you've told us, you have a pre-written perl script that you want to be able to run on behalf of networked users on a single machine, with concurrent access, but no sharing of data between the instances? And you aren't a perl programer :)

It really will depend on how the pre-existing perl script runs, but assuming that the script returns the results via stdout?

If this is the case, cloning an interpreter for each request, or building a pool of clones would probably work ok. I haven't done enough with it embedding -- nothing beyond the simple examples in perlembed -- to be able to predict the performance. Pre-cloning a pool and returning a "busy...try again" message if the pool is fully utilised, ought to be fast enough, if the loading isn't too extreme.

Personally, I would probably use a thread-pool design using threads or maybe a pre-forking design written in perl using perl's win32 pseudo-fork support, as I find perl so much more productive that C/C++.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
Hooray!

  • Comment on Re: Re: Re: Externally managed threads using embedded Perl

Replies are listed 'Best First'.
Re: Re: Re: Re: Externally managed threads using embedded Perl
by Anonymous Monk on Jan 08, 2004 at 05:30 UTC

    Yep, that's pretty much it.

    The existing script is SpamAssassin's spamd. It is designed to run on *nix and uses fork and signals and other stuff that don't work too well on Win32. It reads and writes from/to a socket.

    I find C++ much more productive that Perl - I guess it's just a question of what you know <g>.

    I'll do some work with cloning and see what happens.

    BTW, if I:
    my $spamtest = SpamAssassin::Mail::SpamAssassin->new(...) : shared;
    will $spamtest be available in the clone if I perl_clone after this Perl code is executed?

    You've been most helpful. Thanks.

      I find C++ much more productive that Perl - I guess it's just a question of what you know <g>.

      Well, I know (or perhaps, knew once) C++ also, and I find Perl infinitely more productive. But then again, I hated C++ from my first encounter. Time and experience only made things worse -- but I know I am in a minority in that view:)

      BTW, if I:
      my $spamtest = SpamAssassin::Mail::SpamAssassin->new(...) : shared;
      will $spamtest be available in the clone if I perl_clone after this Perl code is executed?

      I've never seen anyone use that syntax before, and a quick test seems to indicate that you cannot use :share in that way.

      use threads; use threads::shared; use Benchmark::Timer; $T = Benchmark::Timer->new() : shared; Unquoted string "shared" may clash with future reserved word at (eval +3) line 1, <> line 2. syntax error at (eval 3) line 1, near ") :"

      Any attempt to share a blessed reference usually results in an error.

      use threads; use threads::shared; use Benchmark::Timer; { my $T : shared; $T = Benchmark::Timer->new(); } Invalid value for shared scalar at (eval 5) line 1, <> line 5.

      So, I think the short answer is no.

      I assume you were hoping to share a single instance of spamassasin between the multiple interpreters. This will not work.

      Creating a unique instance in each interpreter would(may) probably work, but I have no idea how much memory that would consume.

      That would require a piece of suck-and-see engineering I am afraid. Unless anyone else knows better.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
      Hooray!

        Sorry, I didn't mean to insult you (or anyone) with my C++ comment. I don't know Perl at all (as illustrated in my code example) so C++ is guaranteed to be faster for me. Plus I have good C++ tools (VS.NET) and no real Perl tools (again, editing scripts using VS.NET).

        Fortunately, despite my dismal Perl code example, you understood my question. I can deal with the answer - thanks once again.

        Ideally what I would like to do is prevent running the SA startup/initialisation code more than once (or so). This stuff takes quite a while to crank up and will kill performance. If I can't share a single instance, can I somehow "clone" a pre-initialised instance? (Last question, I promise, at least until I get to do some experimentation tonight).

        If I can't clone the object I'll definitely have to have a pool of interpreters lying around. I'm not overly concerned by the memory consumption as ActivePerl already consumes a heap of memory anyway, I can't imagine a few more instances of SpamAssassin will make that much difference.

        Phil

Re: Re: Re: Re: Externally managed threads using embedded Perl
by paulbort (Hermit) on Jan 08, 2004 at 21:34 UTC
    I'll second the suggestion to drop C/C++ here unless you have a compelling reason.

    You may also want to drop the entire service wrapper, unless you specifically need this program to be controlled like a Windows service. If you just need it to start when the machine does, and you're using Windows 2000 or later, you can just make it a scheduled task that runs "on system startup". The program you launch at that point can use pthreads, pseudo-fork, or whatever else you'd like. Depending on load, it might be interesting to look at the inetd available in Cygwin also.

    --
    Spring: Forces, Coiled Again!

      Your suggestion is noted. I'm afraid my Perl skills aren't up to the task. Perhaps if one of the Perl Monks wanted to do something wonderful for the anti-SPAM community (or at least the anti-SPAM Windows community looking at, or already using, SpamAssassin) you might like to head over to SpamAssassin.org and make spamd run under Windows as well as the platforms it currently works with.

      I think for this stuff to be accepted by the general Windows community it needs to be a service. That also gives it some visibility and increased manageability. I know about the scheduled task thing but in this instance, when we're talking about a (ideally) 24x7 operation, you need a bit more infrastructure to help manage the process.

      wrt cygwin, this is the current solution for running spamd (SpamAssassin daemon) on Windows. Even my current experimentation is showing that the memory consumption will be way down doing it my way without cygwin, even if I can't clone interpreters. I think I read somewhere around here that Perl under cygwin is about half the speed of native Win32 Perl.

      Finally, I also think that a native Windows Service, obviously designed for Windows, will have more credibility (maybe the wrong word) than, "Oh, just install this cygwin emulation environment and do this and that and it'll just work after a day or so of mucking around". This is the current proposition and although some people have done it I think there are a lot more who won't. I'm one of those - I'd rather run a dedicated Linux box than try to emulate Linux on Windows (and that's exactly what we're doing at the moment).

      Feel free to tell me I'm wrong ;-P (I probably am).

      Phil

Re: Re: Re: Re: Externally managed threads using embedded Perl
by Anonymous Monk on Jan 09, 2004 at 06:35 UTC
    I'm back having done a lot of experimental work. I have more questions that would like to be answered. First, some code:
    EXTERN_C void xs_init (pTHX); using namespace std; CPerlEngine::CPerlEngine(char* pScriptFile) : mInterpreter(NULL), mScriptFile(pScriptFile) { this->mInterpreter = ::perl_alloc(); assert (this->mInterpreter != NULL); PERL_SET_CONTEXT(this->mInterpreter); ::perl_construct(this->mInterpreter); char* theArguments[] = {"-x", "-S", "-s", pScriptFile}; ::perl_parse(this->mInterpreter, &xs_init, 4, theArguments, NULL); PL_exit_flags |= PERL_EXIT_DESTRUCT_END; ::perl_run(this->mInterpreter); } CPerlEngine::~CPerlEngine(void) { ::perl_destruct(this->mInterpreter); ::perl_free(this->mInterpreter); } void CPerlEngine::invoke(const char* pFunctionName, vector<string> pParameters) { assert (NULL != this->mInterpreter); PERL_SET_CONTEXT(this->mInterpreter); // Pick up all the stack info in *this* threads local storage dTHX; #ifdef PERL_CLONE_WORKS PerlInterpreter* newInterpreter = ::perl_clone(this->mInterpreter, CLONEf_COPY_STACKS | CLONEf_KEEP_PTR_TABLE | CLONEf_CLONE_HOST); // PerlInterpreter* newInterpreter = ::perl_clone(this->mInterpreter, + CLONEf_CLONE_HOST); // PerlInterpreter* newInterpreter = ::perl_clone(this->mInterpreter, + NULL); #else PerlInterpreter* newInterpreter = this->mInterpreter; #endif assert (NULL != newInterpreter); ::perl_run(newInterpreter); dSP; ENTER; SAVETMPS; PUSHMARK(SP); for (vector<string>::iterator theIterator = pParameters.begin(); theIterator != pParameters.end(); theIterator++) { if (theIterator->length() > 0) { XPUSHs(::newSVpv(theIterator->c_str(), theIterator->length())); } } PUTBACK; ::call_pv(pFunctionName, G_DISCARD); FREETMPS; LEAVE; #ifdef PERL_CLONE_WORKS ::perl_free(newInterpreter); #endif }

    This is my C++ class wrapping the Perl interpreter(s). It's pretty simple. You instantiate it, it loads the specified script, and runs all of the global bits (please excuse any incorrect terminology). So far so good. The global bits are essentially a whole heap of "use blah" type statements. I think there's value in this because it results in a Perl interpreter with script loaded and references loaded - ready to be cloned and executed at will.

    The idea, then, is to call invoke(...) passing the name of the Perl sub to call and some arbitrary number of string elements. This is all pretty cool and seems to work in multiple threads with a basic Perl script.

    If I don't define PERL_CLONE_WORKS, the whole thing works like a bought one (actually better than many) but only in a single thread (obviously).

    Now imagine I define PERL_CLONE_WORKS, instantiate CPerlEngine in one thread, and call invoke(...) on a separate thread. The new thread comes along, clones the existing interpreter, and then calls my sub. This all works perfectly with a basic script BUT with a more complex script (my cut-down spamd) I get a runtime crash out of the Perl Engine in VMem::Free(void* pMem) where it says

    Perl_warn(aTHX_ "Free to wrong pool %p not %p",this,ptr->owner);
    Note that this is not the global cleanup error and it's happening a long time before the end of the script and a reasonable distance into the script.

    I think you (BrowserUk) have seen this before in different circumstances.

    My fallback plan is to create a whole new PerlInterpreter using ::perl_alloc and load absolutely everything from scratch each or have my pool of interpreters (as previously discussed).

    The value in being able to clone on the fly like this is that it won't need to have 50 interpreters lying around consuming vast amounts of memory waiting for something to come in.

    Now to my questions:

    1. Is my class doing everything it should do? If not, what else should it be doing?
    2. I've tried all combination of flags on ::perl_clone and none work. For future reference, what flags should I be using?
    3. What is
      Free to wrong pool
      telling me?

    We're getting closer at least. Another couple of weeks of this sort of questioning and I might get there <g>.

    Phil

      I'm sorry, but I am going to have to chicken out here. You are already way beyond anywhere I have ever been or am likely to go:)

      You may get lucky and have one of the internals guys pick up on this here, but I would suggest that you try the perl5porters list or comp.lang.perl.something, perhaps misc. Your more likely to encounter people who have been-there-&-done-that there, than you will here.

      Maybe merlyn or Abigail-II or one of the other regulars here that also frequent those other haunts can recommend the best place for perl-embedding questions?

      About the only one of your questions that I can even attempt an answer at is the "free to wrong pool". Simplistically, this means that something was created by one thread and came up for deletion from another.

      This isn't necessarially a programmer error in the sense of the perl code being interpreted containing an error, but often a consequence of a sequence of perl-level program flow that causes an internal inconsistancy that hasn't yet been uncovered. Ithreads are relatively new, and not all the bugs have popped their heads up yet.

      That you are doing things rarely done -- embedding -- means you are probably pioneering the use of ithreads in an embedded environment, and are likely to be the finder of such inconsistancies.

      About the best advice I can offer is that you go for extreme safety in the first instance. Slow, big and working is better than dying really fast:)

      Once you get something working, you can then try slimming it down and speeding it up in small increments, keeping those that work, and baypassing those that don't.

      That said, I think I would seriously look at using perl's internal pseudo-fork to do what you are trying to do.

      Instantiate a single interpreter, interprete a script that uses everything you need, establishes the connection to the server, and then forks as many copies as you want in your pool.

      This will give you the many clones of the environment you set up, each independant from the other, and running in it's own kernel thread within the same process. It would then be up to your C++ code to interface to the clones (I don't knw if this is possible!) and manage them. **THIS IS HIGHLY SPECULATIVE**.

      Good luck:)


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
      Timing (and a little luck) are everything!

        Where's your sense of commitment! <g>

        Seriously, I'm surprised you stuck with me this long. Thanks very much for all the tips.

        The "make it work, then make it work fast" principle is what I always work with. Just sometimes you can see "obvious" performance hits that you might as well take out while you're doing it in the first place. ::perl_clone vs ::perl_alloc is just such an instance.

        Bad news (for you), I'm going to pick up on your last point. I can easily call a Perl routine always on the *same* thread. How would it then communicate to the thread pool? (See, now look what you've done. I'm starting to learn perl, diving in at the deep end. My wife's not going to be at all happy with you <g>.)

        Phil