in reply to Re: Perl && threads && queues - how to make all this work together
in thread Perl && threads && queues - how to make all this work together
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: Perl && threads && queues - how to make all this work together
by BrowserUk (Patriarch) on Feb 06, 2010 at 15:25 UTC | |
You code is so full of it is doubtful that it would ever work reliably. You don't say why you wish to avoid using Thread::Queue, but your understanding of Perl, much less your understanding of threading, isn't sufficient to allow you to consider writing your own shared data handling. By way of encouragement, this code does pretty much exactly what your code attempts to do:
It's clear, clean and simple. And works. (Though it is of dubious value, but you wrote the spec!) If the idea of threading your code is to allow you to process your huge file more quickly, that probably isn't going to work unless you spend an inordinate amount of time processing each line. And if that's the case, unless you're using hardware with 16 or more cores, using 30 threads is unlikely to be an optimum strategy. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
by xaero123 (Novice) on Feb 12, 2010 at 17:13 UTC | |
| [reply] |
by BrowserUk (Patriarch) on Feb 12, 2010 at 18:08 UTC | |
BrowserUk, why did you use "threads qw yield ;" and never used the yield() function? I simply forgot to remove the reference. I had originally coded sleep 0.001 while $Q->pending; As yield while $Q->pending;, but switched it because yield() can render to a very tight loop, which consumes large amount of cpu needlessly. yield equates to sleep 0 which basically relinquishes the rest of the current timeslice. But, if no other thread (or process) is ready to run, it returns very quickly making for a cpu-sapping tight loop. On my system, the above code with yield(), consumes 25% cpu. Ie. 100% of one core. The same code with sleep 0.001 consumes so little cpu it gets measured as 0.00%, but it mokes no difference to the overall runtime of the program. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
by xaero123 (Novice) on Feb 13, 2010 at 18:55 UTC | |
| [reply] |
by BrowserUk (Patriarch) on Feb 13, 2010 at 19:39 UTC | |
impossible to underestimate your help! Is that like saying "No help at all!"? :) | [reply] |
|
Re^3: Perl && threads && queues - how to make all this work together
by jethro (Monsignor) on Feb 06, 2010 at 15:36 UTC | |
BrowserUk is the expert for threads here, so I would follow his advice and use Threads::Queue (advantage: lots of error possibilities go away if you use the tried and tested module for the queue). Then open a new question here with your new code if you have the same or other problems. You could also try to get some more information about what is happening. For example: Let every new thread open its own logfile and print to it the time it started and the time before and after it gets an item from the queue. By the way, is your sub getnewsline unfinished code? Because it seems to have something missing: | [reply] |
by xaero123 (Novice) on Feb 07, 2010 at 23:43 UTC | |
| [reply] [d/l] [select] |
by BrowserUk (Patriarch) on Feb 08, 2010 at 01:18 UTC | |
So, can you explain - how code in block "while( !eof FILE ) {" interacts with code in thread sub in just the right way? Just step through the main line code and see what happens. All the interaction is controlled entirely by the Thread::Queue module. And that's a well-tested core module, so we needn't concern ourselves with the details. The only other cross-thread interaction is the value of $pos. And that is usually inaccurate, because the value it contains will reflect the file position at the point the threads access it, which will mostly be entirely different to its value at the time the line that thread is processing was read. Because it spent some time sitting in the queue. I assumed that this was only in your sample code as a mechanism of tracking progress. As such, it served the purpose of demonstrating that DIY locking often doesn't achieve the goal you set out to achieve. If it is important for the threads to know the file position associated with each line, then you should pass the value with the line via the queue. Eg.
If you run that, you'll see that the pos reflects the true position within the file from which the line was read. Note that the position is read before the line. Note also that $pos is no longer shared, the need for locking it goes away, and so the code runs far more efficiently. The code got both simpler and more efficient. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
by jethro (Monsignor) on Feb 08, 2010 at 01:22 UTC | |
== As long as there is something in the queue, do nothing
== read exactly NTHREADS (i.e. 30) lines from the file and put them in the queue. And then we are back to doing nothing as long as there is something in the queue As you can see the call to 'pending' stops the loop from reading more than the NTHREADS lines it is allowed after the queue was empty The join at the end is just to wait for the threads to exit after they have been signaled to exit. Before that the 'undef's pushed on the queue make sure that all the threads get such a signal. By putting NTHREADS of 'undef's into the queue it is made sure that every threads gets its STOP signal | [reply] [d/l] [select] |