Warning: This is a long and convoluted response and the real answer comes at the bottom. But please read it all, because otherwise you will not follow the logic by which I arrive at the final response, and will probably start arguing for things I've already covered.

  1. did i get it right that enqueuing an hash always does a deep copy?

    Yes.

  2. why is this deep copy needed? If the worker thread delivers the hash is is not used any more by it.

    Typically, the feeder end of the queue will be populating a hash; enqueuing it; then populating it with new data and enqueuing that.

    If a copy was not made, by the time the reader got a hold of (what?: a reference to) the hash, the feeder will have already overwritten it with the next record. Or worse, partially overwritten it.

  3. is there a better way of passing hashes between threads than thread::queue?

    You could have a shared array of shared hashes. The feeder populates one of the hashes in the array, and then queues its index in the shared array to the other thread.

    The reader then dequeues the index and knows which of the hashes it should process.

    Of course, once the reader has processed a particular hash, you will want to empty it or remove it from the shared array to prevent memory growing continuously. Ideally, you might have the feeder push the new hash on one end of the shared array, and the reader shift it off the other. You will no doubt recognise this as a queue. Similar to your use of Thread::Queue.

    The only distinction being that instead of pushing unshared hashes on one end, that then have to be copied into shared memory, and then copying the shared hash into a an unshared hash in the reader; both ends deal directly with shared hashes and so save copying.

    You could of course do exactly the same via Thread::Queue.

  4. is there a way to "convince" thread::queue to accept hashes without deep copy?

    Yes. Push references to already shared hashes.

  5. any comments about my approach?

    Yes. Why do you want to queue hashes from one thread to the other in the first place?

    Going back to your original application rather than your wholly artificial test code, paraphrased your description is:

    1. Read a string.
    2. Convert the string to a thread-local hash.
    3. Then
      • Either: copy local hash to a shared hash;
      • Or: convert the local hash back to a string;
    4. queue the shared hash or string;
    5. Dequeue either: the shared hash and assign it to a thread-local unshared hash; or dequeue the string and convert it back to a thread-local hash.

    Maybe this is too obvious, but why not just: queue the string you read; and convert it to a hash at the reader?

But my real comment is this.

The mistake you are making is right up front in your logic,which you describe as this:

  1. set up a couple of worker threads, which parse the line and deliver back one hash per record in an output-thread::queue
  2. read input file and put lines in an input-thread:queue
  3. main process dequeues from output-queue and produces formatted output file

You say that most of the time is spent parsing -- I'd like to see evidence of that as it is very unusual for parsing to take longer than reading; but I'll accept you at your word -- so you have one thread reading from the file and you fan the input out to multiple workers to do the parsing. So far, so good.

But then you queue the hashes they build back to a single thread for further processing. Why?

You also say in your preamble:I have to parse large text files for information and produce output in an different format. One line - one data record.. That's not quite definitive, but strongly suggests that you are writing one line of output for each line of input.

It makes sense that you need to bring the flows back together to write the output file -- it avoids the problems of having multiple writers to a single output file -- but what you will be writing to the output file will be strings, not hashes!

So why ship hashes to the writer thread? Why not perform whatever processing is required to produce the output strings from the parsed hashes in the threads that created those hashes and then queue the resultant strings to the writer in the form required for output?

Finally, please forget all the timings you have done in your wholly artificial (and I must say, way overcomplicated) benchmark scripts, because you are measuring the wrong things entirely.

If you have a real application for this, post the code of a single threaded script that performs all of the required processing, along with a few (say 10 or so typical) records of input data. Once we can see (and time) the real processing involved, it might be possible to suggest ways of using threading to reduce the time taken to do it. Or possibly, to suggest that threading is the wrong solution to the problem.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

In reply to Re: passing hashes between threads by BrowserUk
in thread passing hashes between threads by bago

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.