comment on

BrowserUk I am so glad you answered and took the time to look over my code. Without sounding sycophantic, I've long admired your posts, specifically about threads. They've helped me understand the concepts and avoid common pitfals. So, Hmmmm...

The solution is to desynchronize your threads:

The guard (feeder thread) only concerns itself with ensuring that the 'internal queue' (workQ) doesn't get overly full. It has some threshold -- say N where N is the number of worker threads -- and when the workQ falls below that number it allows another N people in to join the internal queue (workQ).

The clerks (work threads) all get new customers (filenames) from that same single internal queue (workQ), which means that if they are capable of processing 2 (or N dozen files) in a single timeslice, they do not have to enter wait-states to do so.

...It's gonna take me a few minutes to wrap my head around how this would be implemented in code. I'm not sure how to do it and avoid the memory issues mentioned in the documentation for threads. Incidentally, the basis of my code was straight from the documentation for threads and threads::shared. Who knew I would be so off base?

Are you saying that the "guard" should start polling a single queue and stuffing things into it on demand? How long and how often would I have to usleep to avoid an underrun on one hand and excessive interrupts on the other? That in and of itself seems like it could vary wildly from one environment/server/workstation to another. I'm not sure I understand how to go about it correctly.

I actually did consider the fact that the thread management/locking/queueing 1 at a time was killing performance. This is why I started having the "guard" start stuffing $opts>{qsize} number of items into the workers' queues at a time (default: 30). I saw a noticeable improvement.

What to do... Could you steer me in the way of an implementation like the one you suggest? Google seems more interested in "Coro vs Threads" wars and other silliness that doesn't help me.

UPDATE Re digesting:

This implies that your are reading the entire file and digesting it for every file -- regardless of whether there is another file already seen that has the same date/time/first/last/middle bytes.

The code:

Traverses the filesystem
Groups same-size files, tossing out the rest. This is not threaded
Takes each group and reads the first few bytes each file, creating sub-groups based on the bytes read. Then it removes sub-groups with a single element, thereby "throwing out" the non-similar files from the parent group
Makes a second pass at the above, but at the end of the file (the efficiency of this second pass is debatable but shows good results)
Adds up the final N number of files to be processed in a :shared variable
Creates thread pool with worker threads and shoves 30 files at a time into their queues and waits until the threads have incremented the number of files they've processed to equal N

The threads digest the files in their queues in their entirety (this is bad?)

Main thread signals to the threads that they are done by ending their queues and finally joins them

Tommy
A mistake can be valuable or costly, depending on how faithfully you pursue correction

In reply to Re^2: Threaded Code Not Faster Than Non-Threaded -- Why? by Tommy
in thread Threaded Code Not Faster Than Non-Threaded -- Why? by Tommy

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.