in reply to Perl && threads && queues - how to make all this work together

You don't lock @block in getnewblock(). Seems to me there could be a lot of getnewblock jobs running in parallel and parallel to one read access

  • Comment on Re: Perl && threads && queues - how to make all this work together

Replies are listed 'Best First'.
Re^2: Perl && threads && queues - how to make all this work together
by xaero123 (Novice) on Feb 06, 2010 at 13:18 UTC
    I tried to place lock in async block and in getnewblock() procedure. But the result was the same =( I also tried to use Thread::Queue instead of array, but there also were some problems like I described in first post. Have you any other ideas how can this coe work correctly?

      You code is so full of

      • errors:

        eg. $pos is a shared variable accessed and modified by every thread, but you never lock it.

      • misconceptions:
        (I used binary access to be able to operate very large files)

        You do not need to use binary access in order to handle huge files in Perl (nor any other language I'm aware of).

        Which makes all the laborious (and mostly broken) effort you went to to write your own line handling:

        1. slow: reading huge files char by char will take a very long time.
        2. Completely unnecessary.
      • and other weirdness:

        Why abs($threadscnt-1)?

      it is doubtful that it would ever work reliably.

      You don't say why you wish to avoid using Thread::Queue, but your understanding of Perl, much less your understanding of threading, isn't sufficient to allow you to consider writing your own shared data handling.

      By way of encouragement, this code does pretty much exactly what your code attempts to do:

      #! perl -slw use strict; use threads qw[ yield ]; use threads::shared; use Thread::Queue; use Time::HiRes qw[ sleep ]; use constant NTHREADS => 30; my $pos :shared = 0; open FILE, '<', $ARGV[ 0 ] or die $!; my $size = -s FILE; sub thread { my $Q = shift; my $tid = threads->tid; while( my $line = $Q->dequeue ) { printf "%3d: (%10d, %10d) :%s", $tid, $pos, $size, $line; sleep rand 5; } } my $Q = Thread::Queue->new; my @threads = map threads->create( \&thread, $Q ), 1 .. NTHREADS; while( !eof FILE ) { sleep 0.001 while $Q->pending; for( 1 .. NTHREADS ) { $Q->enqueue( scalar <FILE> ); lock $pos; $pos = tell FILE; } } $Q->enqueue( (undef) x NTHREADS ); $_->join for @threads;

      It's clear, clean and simple. And works. (Though it is of dubious value, but you wrote the spec!)

      If the idea of threading your code is to allow you to process your huge file more quickly, that probably isn't going to work unless you spend an inordinate amount of time processing each line. And if that's the case, unless you're using hardware with 16 or more cores, using 30 threads is unlikely to be an optimum strategy.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        OK, thank you very much jethro and BrowserUk! But I have the last question. BrowserUk, why did you use "threads qw yield ;" and never used the yield() function? What effect is this have?
        Ok, thanks, BrowserUk! It is impossible to underestimate your help! :))

      BrowserUk is the expert for threads here, so I would follow his advice and use Threads::Queue (advantage: lots of error possibilities go away if you use the tried and tested module for the queue). Then open a new question here with your new code if you have the same or other problems.

      You could also try to get some more information about what is happening. For example: Let every new thread open its own logfile and print to it the time it started and the time before and after it gets an item from the queue.

      By the way, is your sub getnewsline unfinished code? Because it seems to have something missing:
      You use binary access because of large files. But as long as the text lines are not too long the size of the file won't matter since you only read one line at a time. Only if you also read binary files or files with really (really really) long lines would there be problems. But in both these cases your code as it stands would run into the same problems as a simple line read because you just simulate a simple line read. You would have to add code to stop the line reading if the line read exceeds some specific amount

        jethro - thanks, I've thought about it, but not as deep as needed! :) BrowserUk - I am very pleased that an expert in threading tries to help me! But can you tell more about some lines of your code: At a first look I thought that you code is placing all lines of file in queue in one time. But when I changed printf "%3d: (%10d, %10d) :%s", $tid, $pos, $size, $line; to printf "%3d: (%10d, %10d, %10d) :%s", $tid, $pos, $size, $Q->pending, $line; I saw that I wasn't right in my conclusion. So, can you explain - how code in block "while( !eof FILE ) {" interacts with code in thread sub in just the right way? It is placed outside the sub. Or the trick is that you placed the join procedure only after the while loop, not just after the 'create' sub? Yes, I made a mistake when wrote my own proc with reading a lines with binary access, I've just forgot about the 'tell' function. Why 'abs($threadscnt-1)'? Because I was taught to make all my programs with even though dawns of 'fool-proof' input.