in reply to Changing data in alot of files

Is this the best way to do that?

There is no single answer to your question. There are so many definitions of 'best'. Here are a few possibles.

And the answer to each of these definitions of 'Best', will depend on many other factors. Some examples

If you provide answers to the appropriate subset of these questions for your applications needs, then you may get answers that are truely applicable to you.

Even if your goal is pure speed, the best solution for 1000 x 20k files is likely to be completely different to that for 100 x 200k files or 10 x 2MB files. Slurping to an array of lines is rarely, if ever, as quick as slurping to a scalar, but whether slurping to a scalar is a viable option depends very much on the criteria for your search and replacement requirements.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


Replies are listed 'Best First'.
Re: Re: Changing data in alot of files
by Anonymous Monk on Jul 10, 2003 at 18:50 UTC
    How many files? Over 50,000 files How big are the files? web page files..html and cold fusion so they a +re not big How many changes? ...about 20,000 changes How often will the changes need to be made? ...just one time and the s +cript takes 20 minutes to run How reliable does the process need to be? ..very reliable If the process gets interupted by system failure or other unforeseen e +ventuality, do you need to know which files were processed and which weren't? If some files were re-processed, would th +is be benign repetition? ..files could be reprocessed and this will b +e run during off hours How fast does the process need to be? ...speed shouldnt be too slow bu +t doesnt have to fast..main thing is what I have now does the job and + has SOME effiency
    Also what I have now is quick and easy to maintain.

      "20 minutes" for a "one-time run during off hours" of "20,000 changes" across "50,000 files", and "quick and easy to maintain".

      Seems to me that you have an existing, working solution that satisfies your needs, which begs the question: "Why are you asking your question"? :)

      The only criteria you mention that I don't see being satisfied from your posted code is the reliability. But the simply and effective expedient of copying the files before modification and copying them back once the changes have been completed and verified is so simple and so effective that I would be reluctant to move to a more 'sophisticated' solution. If you aren't keeping your files in a source management DB (CVS or similar), then I would strongly recommend you start doing so, especially with that number of sources.

      If you do use source control software, for this kind of change I would do a mass extraction, run the script and then mass update having suitable checkpointed, rather than trying to do the extractions and updates file-by-file as a part of the script, but thats a personal thing.

      Another quick look at your code and given the relatively small size of your files, I would probably slurp to a scalar rather than an array as you can then allow regex to process the whole file in one pass with each regex, which would possibly speed the process a little, though you might then need to be slightly more careful with the construction of your regex and investigate the /s and /m modifiers as well as becoming familiar with the differences between ^ & \A and $ and \Z.

      If I was really interested in wringing performance out of the process, then I might consider using one thread to slurp the files to scalars and queue them to a second thread to run the regexes and a third thread to write them back to the file system, but given the current state of play with memory leaks from threaded apps, and the not negligable increase in complexity that this would add, I couldn't advise it unless the need for speed was desperate, which it clearly isn't in this case.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


        I am amazed at all the time and effort you all took to help me with this. Thank you!!! I will print all your responses and reevaluate what I am doing. Thanks again to everyone for all the details!

        The main thing is what I have now will not crash the system and does do what I need it to do but it can be made more efficient and I learned more better ways to do this.