in reply to Segmentation fault: problem with perl threads
I am using perl, v5.8.3
That's 5 years and many releases old. There have been a lot of fixes to threads in the mean time. You must upgrade to find a solution.
I would move up to 5.8.6 as that was the most stable version for threading, with subsequent changes making it less reliable. Hopefully, the imminent 5.8.9 will have resolved some of the new quirks, but only time will tell.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Segmentation fault: problem with perl threads
by moritz (Cardinal) on Sep 15, 2008 at 12:55 UTC | |
Hopefully, the imminent 5.8.9 will have resolved some of the new quirks, but only time will tell. That sounds a bit fatalistic. One way you can help to actually make it better is to test it now. If you have an application with heavy threads usage, download 5.8-maint now and report any errors. | [reply] |
by BrowserUk (Patriarch) on Sep 15, 2008 at 13:25 UTC | |
Is "perl-5.8-maint" otherwise known as: perl-5.9.5.tar.gz? Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
by moritz (Cardinal) on Sep 15, 2008 at 13:31 UTC | |
You can download the sources like this: Or like this:
| [reply] [d/l] [select] |
by BrowserUk (Patriarch) on Sep 15, 2008 at 13:38 UTC | |
by moritz (Cardinal) on Sep 15, 2008 at 13:49 UTC | |
| |
by bingos (Vicar) on Sep 16, 2008 at 06:58 UTC | |
|
Re^2: Segmentation fault: problem with perl threads
by katharnakh (Novice) on Sep 18, 2008 at 05:52 UTC | |
_replicate() is passed with a datastructure and it looks like this,
I checked this datastructure carefully, it looks like what i intended, so no problem till here Below is the same set of functions im posting as earlier. Because i think the problem lies here. I tried running my script on perl5.8.8 and still i get 'Segementation fault' even if i actually execute the rsync command in thread or just print rsync command in thread and return. I strongly believe, i might be calling join() method on thread object which might have died after finishing its job. Hence i try to dereference a reference which is deallocated(may be or ..?). This happens because, when 10 threads are running parallely and i wait for a 2nd thread, suppose, to join. Meanwhile 3rd or 4th or 8th(anything till 10) might have finished running. Once 2nd joins and main thread tries to call join() on next thread object, in the array(either returned by threads->list or i keep thread object in a array), which no more exists, or no clue whether the thread is joinnable. I tried to make sure whether thread is running as you can see in below code, _replicate(),
I would ask, is there anyway i would make sure all threads are finished or call join on only those threads which are joinnable or i have to go with other solution which sent earlier, fork() ing processes, instead thread? Thanks in advance, | [reply] [d/l] [select] |
by BrowserUk (Patriarch) on Sep 18, 2008 at 08:35 UTC | |
This happens because, when 10 threads are running parallely and i wait for a 2nd thread, suppose, to join. Meanwhile 3rd or 4th or 8th(anything till 10) might have finished running. Once 2nd joins and main thread tries to call join() on next thread object, in the array(either returned by threads->list or i keep thread object in a array), which no more exists, or no clue whether the thread is joinnable. This is a red herring. When non-detached threads end, they wait until you call join on them before being cleaned up. You do not need to check anything before calling join. If the thread has ended before you call join, it will return immediately. If the thread is still running, it will block until the thread ends. This is how they are designed to work. Your problem lies elsewhere. You keep posting these snippets of code, but they are so dependant upon the rest of the program that you are not posting, that it is impossible for anyone to run them in order to try and help. They are also full of lumps of commented out code, rambling comments that wrap 3 times and worst of all, all this insane "logger" crap which completely obscures the structure of the code. It is not surprising that you cannot get this to work as you cannot see what it is that you own code is doing. So, a lot of critisism which you may not like, so I'll try to show you that the critisism can help. Here is your code above, with all the crap stripped away, a few extra spaces and blank lines etc.
Now the structure and essentials of the code are clear and easy to follow, and it is easy to pick out several problems:
If you are going to be doing multi-processing, whether through threads or forks, the secret is to start simple. Write your worker subroutine in a standalone, single threaded program, and make it work. Once you've make sure it is working that way, then try running two copies concurrently using threads or forks. Once you've got that working reliably, only then try to scale it up! You asked whether you should move to using forks. If you have a native fork on the platform you are working on, then there is nothing obvious from the code you have posted that requires threads, so you probably could use forks. But, on the basis of the code you've posted, I think that you are likely to have just as many problems trying to work in that environment as you are having with threads. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
by katharnakh (Novice) on Sep 23, 2008 at 07:04 UTC | |
When non-detached threads end, they wait until you call join on them before being cleaned up. You do not need to check anything before calling join. If the thread has ended before you call join, it will return immediately. If the thread is still running, it will block until the thread ends. This is how they are designed to work. Your problem lies elsewhere. Thanks for letting me know the design, where i had falsely assumed something. You keep posting these snippets of code, but they are so dependant upon the rest of the program that you are not posting, that it is impossible for anyone to run them in order to try and help. They are also full of lumps of commented out code, rambling comments that wrap 3 times and worst of all, all this insane "logger" crap which completely obscures the structure of the code. It is not surprising that you cannot get this to work as you cannot see what it is that you own code is doing. So, a lot of critisism which you may not like, so I'll try to show you that the critisism can help. My apologize, for sending unstructured code with unwanted comment which made difficult to people to look at code, who really want to help. Yes i know, the code sent depedends on rest part of the program, but i cannot post the whole code, because it is too big. Thanks for showing me 'how critisism can help'. Now the structure and essentials of the code are clear and easy to follow, and it is easy to pick out several problems:
You create your thread here my $th = threads->create( \&worker, $robj );,
but then you do push @thr_arr, $th->tid and then call join() on that object;
which means that @thr_arr contains a list of thread ids, not thread objects!
which means when you come to try and join your threads, you are trying to call the method join() on a number and that obviously isn't going to work. No, you have missed one line, while formatting the code to show how one can neatly post a code which is clear and easy to follow, which actaully gets the thread object associated with thread-id.
You are calling rsync using backticks: `$rsync_cmd`;, but you are doing nothing with any ouput produced. That means you are having the system build a pipe and collect the output, and then just throwing it all away. That is because, im redirecting the command output to a file. If you wish, you can look in sub worker{ ... } and sent datastructure. Have you heard of system? And now for the biggest problem, the design of your code in _replicate(). You have 2 nested loops. Within the outer loop you run the inner loop which creates a bunch of threads all trying to contact same server. And then block until that finishes, with several retrys and 120 second waits, before starting another bunch of threads to contact the next server. This is fundamentally bad design. If one server is slow, or broken, with all your threads trying to talk to the same server, you will basically be doing a lot of nothing, when you could be talking to one or more of the other servers in parallel. Can i ask you, what made you think(from code) that, i contact to different(or next) server when i create next bunch of threads? For every set of threads i create inside a loop, i contact same server. But during execution (in thread block) i wait, if command execution fails, 120s to contact same server with diff. port. I appreciate your descriptive post. katharnakh. | [reply] [d/l] [select] |