Re: Search and Replace.

Your while loop makes more sense, so I'm going to build from there. Look at your list of substitutions:

s/^[ ]*//g;
s/[ ]*$//g;
s/\:000[A,P]M//g;
s/99991231//g;
s/Jan 1 1900 12:00:00//g;
s/[ ]*\~\t\~[ ]*/~/g;
[download]

I see that all but the last one are removing text, so why not combine them into one pass? Also, youre first two appear to be designed to trim leading and trailing spaces from lines: there's a pretty standard way to do this (s/^\s+|\s+$//g), so I took the liberty of using that structure below. My trailing 'x' causes whitespace to be ignored, so I've replaced whitespace in matches with '\s'; you may want \0x20 instead, I don't know.

<update>Modified code with ikegami's suggestions from below. I chose '\ ' as the method to escape a space.</update>

while ( <IN_FILE> ) {
   s{
     (?:^\ +|\ +$)
    |(?:\:000[A,P]M)
    |(?:99991231)
    |(?:Jan\ 1\ 1900\ 12:00:00)
   }{}gx;
   s/[ ]*\~\t\~[ ]*/~/g;
   print OUT_FILE $_;
}
close IN_FILE;
## and unlink() the filename for IN_FILE
## then rename() outfile to infile.
[download]

This should reduce your exec time a little bit, as it makes two regex passes instead of six. However, I suspect the slowest thing going is really disk IO (it usually is, with file operations). Doing the "write to another file then rename" has typically been faster than in-place editing, for me.

If you're reading your file over a network... well, don't -- make a local copy, process it, and pass it back to the network location. That will nearly always be much faster than streaming IO over a network.

Yoda would agree with Perl design: there is no try{}

Comment on Re: Search and Replace. Select or Download Code

Replies are listed 'Best First'.
Re^2: Search and Replace. by ikegami (Patriarch) on Jun 08, 2005 at 14:31 UTC
Don't replace spaces with \s. \s would be slower since it matches spaces, tabs, carriage returns (I think) and line feeds, and even more if it's a unicode string. "`\x20`", "`\040`", "`\`", and "`[ ]`" work. (I wonder if the last is slower than the others. I'll Benchmark later.) You've also added useless captures. Don't use `(...)` (which incures a speed penalty), use `(?:...)`.	[reply] [d/l] [select]