I want to write an application that reads from a file and once it's read enough (filled up a buffer of some size) does some processing on what it's read. A simple example of the processing might be to count the frequency of words in a text file. To take advantage of multiple cores, I'd like to have several threads (each responsible for words starting with a different letter, say) processing the buffered data in parallel. Once they have all finished, reading from the input file would resume, refilling the buffer, etc. until the entire (large) file has been processed. Once the entire file has been read and words counted, I'd like to print out the frequency of each word found.
I've read the thread tutorial and it seems like this should be pretty straightforward, but I'm not sure if the pattern is best suited to a queueing model, or whether the word frequency hashes should be shared data, or exactly how to manage the flow from single file-reader thread to parallel processing threads, back to file-reader, etc. and finally to single output-writer thread.
Any suggestions (or pointers to previous examples) for this kind of pattern?
In reply to to thread or fork or ? by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |