Re^3: Problem with using threads with modules.

I suspect this will be a case of "too many words", and maybe to some degree (as I have been accused of before), the "blind leading the blind"

I am not an 'expert' in threads (nor perl). I am just someone who has:

Used threads outside of Perl and understands them at the OS level (to some degree).
Has followed the progress of iThreads fairly closely.
Hasn't written them off as so many have done.
Chooses to use an OS where fork is not suppported natively and where using Perl's pseudo-fork emulation (implemented with ithreads) of that idiom gives few if any of the benefits of forking that OS's that suppport this natively have, and in doing so, discards the main benefits of threads.
Is attempting to "do his bit" by exploring what is and is not achievable with iThreads.

As such, as with everything I post, read what I have written, but make up your own mind regarding it's utility and accuracy.

The following is an attempt to answer this question:

When you say that I Implicitly shares objects across threads, how is this possible?

To try to explain this, I'll use the snippet of code from your original post.

#!/usr/bin/perl -w
use strict;
use threads;
use IO::String;
## Point A

my $var; 
my $io = IO::String->new($var);
## Point B

my $th=threads->new(\&Update); ## Statement C

$th->join(); 

sub Update{};
[download]

By the time you reach Point A in this code, perl has already loaded all the code from the modules 'strict' (+ sub-dependances), 'threads' (+sub-dependances), 'IO::String'(+sub-dependances). With use all this code is loaded + any package global variables created by those modules has been allocated during the BEGIN{} phase of running your code.

By the time you reach Point B, $var & $io have also been allocated. In the case of the former, this is a simple scalar. In the case of the latter, the new() method of the IO::String class has also been run, and any variables it creates have been allocated. FInally, the scalar $io has been blessed. What that means is that as well as $io being a reference to some storage that hold the state of this instance of the IO::String class, $io now also carries (behind the scenes), pointer(s) to the byte code that implements the class methods.

When your program does something with $io, it's value tells perl which instance (storage; state) of IO::String you are operating on. The hidden pointer(s) (often called 'magic'), tell it where to find the code for the methods that can operate upon that instance.

When Statement C is executed, what happens (simplistically stated), is that an new copy of the interpreter is created and everything that constitutes your program in memory up to this point--ie. everything above--is duplicated into memory allocated by that new interpreter.

In effect, this is somewhat similar to if you had forked your program at that point or, stated another way, as if you had run a second copy of your program and stopped it at that same point. The difference is that unlike two separate processes which would not be able to access the memory of the other copy, the two copies created by spawning the thread can. That is the major advantage of threads--they can communicate with each other through direct memory access rather than serialisation through pipes etc.

As stated, this is a simplification. Some of the memory allocated by the first thread is not duplicated into the second thread. The non-duplicated elements are "process global". This includes such things as file handles, some of Perl's "Special Vars", and some internal state used by Perl itself.

This duplication is not an effect of threads per se. It is the implementation chosen by the iThreads implementers. The advantage is that your simple variable $var, is now two simple variables--one for each thread. Each thread can now manipulate its copy of $var without needing to concern itself with synchronisation.

Equally, the object $io, has also been duplicated. The problem is that not only has the instance storage been duplicated, so has the method code. When one thread uses $io to invoke a method, the hidden pointer (magic) tells it where to look to find the code, and the reference value itself tells it what state to manipulate, and each copy of $io not only points to a different copy of the state, but its associated magic also points to a different place.

Now, to share the copy of the simple variable $var, which is after all one of the main reasons for using threads, you must designate that it is to be shared using the my $var : shared; nomenclature.

What happens then (and please, don't take my description too literally!), is that the two copies of $var are tied. That is to say, each has hidden pointer (magic) applied to it so that when your code modifies one copy on one thread, behind the covers, that update is also applied to the other copy. The exact mechanism by which this happens is irrelevant in that, as far as your program is concerned, you only have one copy which all threads that can see that copy can manipulate.

The problem comes with trying to share objects. If you applied the shared attribute to $io (which you can't because it won't let you), then not only would the state of $io have to be replicated each time it changed to any other threads copy. Also, the value of the magic would also need to be replicated. And that's impossible.

To understand why you can't share the code (methods) that implement a class between two threads is complex, but as a simplified example. Say you had a class that had a settable separator or terminator. (think $/ for an IO class or ',' for a CSV class). You create an instance on one thread and set the separator to ','. On another thread you set it to '|'. If you could share an instance of this class between the two threads, you have a conflict.

This could be notionally be alleviated by always storing a copy of CLASS DATA in every instance, but then each time you modified the CLASS data, the class would have to search out each if it's instances and update that CLASS value. If you store the class data with the class, then when you tried to use an instance created on one thread with an instance created on another you get the conflict. Is this a comma separated instance or a pipe separated instance? The problems run much deeper than this.

However, that doesn't mean that you can't used threads and objects. It just means that you have not to share objects between threads.

It also means that using require to load modules only into those threads where the module will be used will save memory over useing them as it will avoid them being duplicated into threads that don't need them. I'll try to offer a solution to your actual problem in a separate reply.

Examine what is said, not who speaks.

"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
"Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon

Comment on Re^3: Problem with using threads with modules. Select or Download Code

Replies are listed 'Best First'.
Re^4: Problem with using threads with modules. by tele2mag (Initiate) on Jun 23, 2004 at 19:02 UTC
To the statement of being "two many words", I just want to say that I prefer that instead of a brief cryptic answer one might get. You gave me a chance to understand the mechanism around the issue not only the coding part of it. I thanks you for that. For me, English is not my native language (instead my 2nd) so I apologise for any grammar or typing errors in my posts. I think I understand your post completely, but something is still bothering me. For example, to start a new thread, you said that almost all memory are copied to the new thread. Just like starting a new instance of the program at that momentary interpreter position. So unless I explicitly share my variables (is ok) or objects (not ok), these both threads would not know of each other. So why does Perl debugger bother me with that message I've presented in my first post? If I run the program with the thread starting the empty shell method `Update()` nothing is then shared and It should not be a problem. If you think I'm a lost cause in this issue, you can supply me with some good URL:s so I can expand my thread-wisdom further, as the monks would put it. I have some knowledge in assembly programming with context-switchin with processes but not so much about thread implementations. We can leave it with that, you have already surpassed my greatest expectations while getting help ;) Don't know how to repay you but I can at least spread my new gained knowledge to other.	[reply] [d/l]
Re^5: Problem with using threads with modules. by BrowserUk (Patriarch) on Jun 23, 2004 at 20:45 UTC
I'm not so great in my first language:) I understand what you write and that's enough--for me. In purely causal terms, the error message `Attempt to free unreferenced scalar: SV xxxxxxxx during global destruction.` simply means that when your perl script is ceaning up memory when exiting, a reference was encountered that needed to be freed. Before the memory containing the reference can be freed, the reference count of the thing it pointed at has to be decremented, but when the reference count of the referent is check, it is already at 0. Hence the warning is issued. What this indicates is that somehow, a reference to an object has been created without the reference count of the referent having been incremented. This probably constitutes a bug. However, as the problem can been alleviated by not creating objects that are implicitly shared between threads--something that isn't useful and you aren't meant to do--the severity of the bug is fairly minor. It may also be that it isn't possible to correct the "bug", because it is a symptom of trying to do something that cannot be done--but that is supposition. The bug arises during the duplication process invoked when you spawn a thread having previously created an object, This can be demonstrated by modifying your original snippet to create more than one thread: #! perl -slw use strict; use threads; use IO::String; ## Point A my $var; my $io = IO::String->new($var); ## Point B my @th= map{ threads->new(\&Update); } 1..5; $_->join() for @th; sub Update{}; __END__ P:\test>368516 Attempt to free unreferenced scalar: SV 0x1aba0c4 during global destru +ction. Attempt to free unreferenced scalar: SV 0x1a4e06c during global destru +ction. Attempt to free unreferenced scalar: SV 0x19e1fdc during global destru +ction. Attempt to free unreferenced scalar: SV 0x19749fc during global destru +ction. Attempt to free unreferenced scalar: SV 0x2b26134 during global destru +ction. [download] As you can see, you now get one warning message for each thread created. This suggests, though I'm not exactly clear on the mechanisms involved, that during the replication process, each thread is getting it's own copy of the reference to a single SV somewhere, but the reference count of that SV is not being incremented. Then at exit, when perl is trying to clean up those references, the first thread's reference (of the 6--five I created + the original), to the commmon SV gets cleaned up and the SV reference count gets decremented to 0. All is hunky dory and no error is issued. However, when the references in the other threads come to be cleaned up, the attempt is made to decrement the reference count of the common SV and it is found to already be 0. Hence the warnings. The fact that this only happens when the program is exiting means that you could probably ignore the messages if the rest of the program was working as expected. But in the long term, it is a sure indication that something is wrong, and if you tried to actually make use of the implicitly shared object, then you will almost certainly not get the behaviour you are expecting. With respect to further reading on threads. The last time I looked I couldn't actually find much in the way of good documentation on iThreads. There is a little general description stuff around but not much by way of practical "How to"s or Cookbook style. This isn't really surprising. As far as I am aware, the implementation mechanisms involved in iThreads are completely unique to Perl. They evolved to compensate for the problems encountered in trying to implement pThreads for perl. The underlying nature of Perl itself, and the non-reentrancy of perl's internal data structures and the C-runtime that underlay it, mean that even sharing Perl's scalars between concurrent threads of execution, never mind arrays, hashes and others, is nearly impossible to do safely. As such, iThreads are very new and until very recently, not stable. Consequently, not many people have done very much with them, and so there is no great pool of knowledge from which to draw in order to write such documentation. Personally, I believe that iThreads are now stable enough that they are usable in production environments. And we will slowly see them being utilised for real work, despite their inherent limitations, and the body of knowledge required to construct a cookbook will evolve. Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail "Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon	[reply] [d/l] [select]