in reply to Search and Replace.
Your while loop makes more sense, so I'm going to build from there. Look at your list of substitutions:
I see that all but the last one are removing text, so why not combine them into one pass? Also, youre first two appear to be designed to trim leading and trailing spaces from lines: there's a pretty standard way to do this (s/^\s+|\s+$//g), so I took the liberty of using that structure below. My trailing 'x' causes whitespace to be ignored, so I've replaced whitespace in matches with '\s'; you may want \0x20 instead, I don't know.s/^[ ]*//g; s/[ ]*$//g; s/\:000[A,P]M//g; s/99991231//g; s/Jan 1 1900 12:00:00//g; s/[ ]*\~\t\~[ ]*/~/g;
<update>Modified code with ikegami's suggestions from below. I chose '\ ' as the method to escape a space.</update>
This should reduce your exec time a little bit, as it makes two regex passes instead of six. However, I suspect the slowest thing going is really disk IO (it usually is, with file operations). Doing the "write to another file then rename" has typically been faster than in-place editing, for me.while ( <IN_FILE> ) { s{ (?:^\ +|\ +$) |(?:\:000[A,P]M) |(?:99991231) |(?:Jan\ 1\ 1900\ 12:00:00) }{}gx; s/[ ]*\~\t\~[ ]*/~/g; print OUT_FILE $_; } close IN_FILE; ## and unlink() the filename for IN_FILE ## then rename() outfile to infile.
If you're reading your file over a network... well, don't -- make a local copy, process it, and pass it back to the network location. That will nearly always be much faster than streaming IO over a network.
Yoda would agree with Perl design: there is no try{}
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Search and Replace.
by ikegami (Patriarch) on Jun 08, 2005 at 14:31 UTC |