"20 minutes" for a "one-time run during off hours" of "20,000 changes" across "50,000 files", and "quick and easy to maintain".

Seems to me that you have an existing, working solution that satisfies your needs, which begs the question: "Why are you asking your question"? :)

The only criteria you mention that I don't see being satisfied from your posted code is the reliability. But the simply and effective expedient of copying the files before modification and copying them back once the changes have been completed and verified is so simple and so effective that I would be reluctant to move to a more 'sophisticated' solution. If you aren't keeping your files in a source management DB (CVS or similar), then I would strongly recommend you start doing so, especially with that number of sources.

If you do use source control software, for this kind of change I would do a mass extraction, run the script and then mass update having suitable checkpointed, rather than trying to do the extractions and updates file-by-file as a part of the script, but thats a personal thing.

Another quick look at your code and given the relatively small size of your files, I would probably slurp to a scalar rather than an array as you can then allow regex to process the whole file in one pass with each regex, which would possibly speed the process a little, though you might then need to be slightly more careful with the construction of your regex and investigate the /s and /m modifiers as well as becoming familiar with the differences between ^ & \A and $ and \Z.

If I was really interested in wringing performance out of the process, then I might consider using one thread to slurp the files to scalars and queue them to a second thread to run the regexes and a third thread to write them back to the file system, but given the current state of play with memory leaks from threaded apps, and the not negligable increase in complexity that this would add, I couldn't advise it unless the need for speed was desperate, which it clearly isn't in this case.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller



In reply to Re: Re: Re: Changing data in alot of files by BrowserUk
in thread Changing data in alot of files by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.