I'll assume your general goal is to reduce wall clock time.
With a huge number of small files, you are most probably bound by disk seeks. Threading helps to do something with the CPU during these seeks, but does not attack the root cause. I've not seen an OS interface for optimized reading from many files.
In your single threaded application I'd think about defragmentation on the file reads. In a preliminary phase you can 'stat' all input files, sort the list by inode and read it one by one. You won't hear disk head movements anymore.
'Threading for the lazy' would not require changing perl versions. It involves splitting your application into three processes that work over an OS pipe. The first part would do the 'stat' task. It pipes the sorted file names to the dumper via STDOUT. Include a large output buffer to separate 'stat' seeks from 'read' seeks. The second process ('dumper') is designed to be waiting on I/O most of the time. After dumping a file contents it sends some 'EOF' token to the mostly unchanged interpreter process. Your envelope could be a shell script 'stater | dumper | interpreter' or a pipe 'open' variation in perl.
On Linux the CFS I/O scheduler and 'ionice -c 3 <program>' prioritizes other processes.
Unless you now need to optimize away being bound by a single CPU core in the interpreter I'd not think further about threading.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.