comment on

I would wonder why you think threading would be a good approach?

I have a program that does, something similar. I wanted to compare version numbers of "rpm"s against version numbers as reported by RPM, of all other rpms that have the same base name.

In my case, the multiple lists are many but small. I.e. I can have 5-10K different rpm names, but usually only 1-3 different versions (assuming it's been pruned regularly).

The thing that takes time in my circumstance was calling "rpm" with a query that gives me it's idea how it split the Name from the Version and Release. Calling rpm or it's library, would still involve it opening the rpm package on disk, to parse it's header.

As a result, much time is spent in disk waits -- and a bit of experimentation on a 12-core over a RAID50 showed me that scheduling about 9 inspection threads/procs at a time yield the least amount of overall time (though the speedup is only between 3-6X so it may not be the most efficient in terms of cpu usage, but I was looking for real-time benefits).

So I make sure my list is sorted by names and split my list by # procs to allow. They each go off and run through their list, and when done, report back to the master -- sending back their reduced lists and a second list of rpmnames that are 'redundant' (to be removed).

I also make sure that the minimum amount of work for each worker is at least "X" queries. If it isn't, I reduce the number of overall workers.

In development, I found it useful to write the results to temporary space before it was re-merged by the parent, but after it was working, I went to using named pipes. Note.. I DID use /dev/shm for the location of the temporary space, so it was, in some respects, still IPC, but I could examine the results by the files created in /dev/shm if I wanted to, during development.

I didn't see that threading offered any advantage over multiple procs, since perl threads are really separate procs anyway, and, at least for me, being able to send the child output to an intermediate space during development was real helpful. Since The child and parent both just used "FD"'s, it was trivial to switch them to directly talking over pipes once the development was done.

A larger than normal run (took two different distro releases and combined and ran them through). Looks like:

> time remove-oldver-rpms-in-dir.pl
Read 35841 rpm names.
Use 9 procs w/3984 items/process
#pkgs=21111, #deletes=14730, total=35841
2 additional duplicates found in last pass
Recycling 14732 duplicates...Done
Cumulative  This Phase   ID                                   
 0.000s      0.000s      Init                                 
 0.000s      0.000s      start_program                        
 0.060s      0.060s      starting_children                    
 0.065s      0.005s      end_starting_children                
 118.643s    118.578s    endRdFrmChldrn_n_start_re_sort       
 123.202s    4.559s      afterFinalSort                       
202.70sec 18.16usr 64.72sys (40.89% cpu)
[download]

The final difference from 123-202 was spent in moving each of the files to a per-disk recycle bin, that I periodically empty via another script.

For me, the use of threads would have complicated things.

Does that give you any ideas?

linda

----
P.S.-- If speed was really important, I could likely benefit by using multiple cores on that final step that take >100 secs, as it's all single threaded. I could probably 'rename' at least 3-5 files in parallel ... but it's just a maintenance script, so not a real high priority...

In reply to Re: multi threading by perl-diddler
in thread multi threading by egunth

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.