Re: Simultaneously process the multiple files at the same time
by Corion (Patriarch) on Jun 15, 2011 at 09:22 UTC
|
As you have been told repeatedly already, first find out where your script is slow. In many cases, reading the data from disk is the slowest part. The best way to make reading from disk faster is to buy faster storage.
Using threads will usually not make reading from storage faster, as it increases the administrative overhead while still leaving the bottleneck unchanged.
| [reply] |
Re: Simultaneously process the multiple files at the same time
by DrHyde (Prior) on Jun 15, 2011 at 09:57 UTC
|
You need to profile your code to see where it's spending the most time. Once you've done that, the solution will probably be obvious to you, but if it isn't, people here will be happy to help provided that you supply all the necessary information.
However, it is my experience that when it comes to crunching text files in perl the bottleneck is in reading and writing the disk even when my perl code is really crufty. In which case the solution is to buy faster disks, with bigger caches, and arrange them so that they can be read in parallel without saturating the various buses. Your sysadmin will be happy to help with this. Note that you can read/write disks in parallel without having to parallelise your code. I leave figuring out how as an exercise for your sysadmin and operating system vendor.
| [reply] |
Re: Simultaneously process the multiple files at the same time
by BrowserUk (Patriarch) on Jun 15, 2011 at 10:44 UTC
|
cat *.csv > /dev/null
If so, there may be some scope for improvement.
But (as pointed out by everyone else) you are going to have to supply more information. Including the size of the files; how long it currently takes; how you are performing the conversion (the code!); what hardware you are running on; can you put the output files on a different drive to the input files. Etc.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
Re: Simultaneously process the multiple files at the same time
by GrandFather (Saint) on Jun 15, 2011 at 10:03 UTC
|
How big are the files and how long does the processing take? How much faster does the processing need to be? How many lines of code in your script?
So far you have told us that you have a small number of files (300-500 files is a small number) that you are either converting to cvs or from cvs (it's not clear which way you are going) and it's taking too long. But that is nothing like enough information for us to actually provide you any useful advice. Perhaps you would like to tell us more about what you are trying to do and why the current processing time is a problem so we have a little more context.
True laziness is hard work
| [reply] |
Re: Simultaneously process the multiple files at the same time
by locked_user sundialsvc4 (Abbot) on Jun 15, 2011 at 12:03 UTC
|
Use a simple threading mechanism to spawn 1,000 threads. This act will instantly cause your one CPU to become 1,000 times more capable than it now is, because each thread will multiply, rather than divide, the available computing resources. In fact, once the 500-thread threshold is reached, the quantum field surrounding the device becomes so intense that a localized time-warp occurs, so the work is completed before it is begun.
This remarkably simple act will also cause the disk drives to magically sprout 1,000 new read/write heads for the duration of the job, thus enabling them to simultaneously read and/or write all of the data at the very same instant, without “seek time” or “rotational latency.”
Remember: “You never have to improve any algorithm. Just throw threads at it. Threads always make a program run faster.”
| |
|
|
| |
|
|
Reminds me of the anti-Pill campaigner who went round telling anyone who would listen, and everyone else within earshot, that the Pill was evil and useless. She claimed to base her evidence upon personal experience of falling pregnant whilst on the medication.
When pressed, she explained that "Every time I stood up, they fell out!".
Like anything, threads have to be used properly to be beneficial. And if the above is evidence of your level of understanding on this subject, you'd best just shut-the-f*** up.
| [reply] |
|
|
Nice story, but don't forget to recharge the batteries in your sarcasm detector on a regular basis.
| [reply] |