in reply to Large file processing
That aside, this is a reasonable way to process your file. If it is still taking too long obviously you can look at process_paragraph and see whether it is efficient.
One thing that strikes me is that you're duplicating lots of strings (1 Gb x 2 at least). It may be more efficient to pass around references (ie \$_).
Also, if you are on a multi-core/cpu box you might want to split the file into a number of pieces and process it that way. A cheat-easy way would be to have one instance of your script process odd paragraphs and another process the odd. Better would be to use seek to skip to half way (or an amount appropriate to your number of instances) and start at the next paragraph. Using the seek method you will need to be careful not to process the same paragraph more than once (or to skip the boundary paras).
The key with all optimisations is to make sure you are actually speeding things up by benchmarking your initial solution, and re-benchmarking your proposed changes. See Benchmark
|
|---|