how to remove a specific character from a file efficiently..

Ankur_kuls has asked for the wisdom of the Perl Monks concerning the following question:

I have two perl files which after processing produce a output file. these files contain records, end part of which look like below.

\"version\":\"2.0\"}")]]]

Now for few days we are receiving new line character in the records like below which is causing failure.

\"version\":\"2.0\"}\n")]]]

Now for this I used the below extra statemnet (4th one in below code) which is working perfectly

 
while(<AH>) {
  chomp;
  my $line=$_;
$line =~ s/\\n\"\)]]]$/")]]]/g;
[download]

Now problem is, that these files contain millions of records & because of this extra command it is taking exactly double time (earlier 1.5 hrs now 3 hrs). Is there any other way to remove it in efficient way...Please help.

Comment on how to remove a specific character from a file efficiently.. Select or Download Code

Replies are listed 'Best First'.
Re: how to remove a specific character from a file efficiently.. by ikegami (Patriarch) on Jul 02, 2014 at 05:03 UTC
The `/g` is silly since it can't possibly find more than one match. The regex engine doesn't realize the match has to be at the end of the string, so it might help to only look there if the lines are really long. `substr($line, -6, 6, "\")]]]") if substr($line, -6) eq "\n\")]]]";` [download] Not sure how it will affect you. But if your lines are so long that the regex match is slowing you down, perhaps you shouldn't make a needless copy of it. Get rid of `my $line=$_;`. Either work with `$_`, or use `while (my $line = <AH>)` to assign to `$line` instead of `$_`.	[reply] [d/l] [select]
Re: how to remove a specific character from a file efficiently.. by jellisii2 (Hermit) on Jul 02, 2014 at 11:34 UTC
This smells like an XY problem: The admittedly tiny piece of data looks like it may be JSON or something of the like. Is there a library you can use to parse the files so you don't have to manually mangle it? If so, while a rewrite may be required, you'll probably come out ahead in maintainability.	[reply]
Re: how to remove a specific character from a file efficiently.. by gurpreetsingh13 (Scribe) on Jul 02, 2014 at 10:04 UTC
Why not do that in a separate process instead of adding it in your script. `perl -pi -e 's/\\n\"\)]]]$/")]]]/g' <filename>` [download] Either via a batch script or a shell script or on command line itself or a cron job.	[reply] [d/l]
Re^2: how to remove a specific character from a file efficiently.. by ikegami (Patriarch) on Jul 02, 2014 at 22:33 UTC
That will not make it any faster. It's very likely to make it slower.	[reply]
Re^3: how to remove a specific character from a file efficiently.. by gurpreetsingh13 (Scribe) on Jul 03, 2014 at 03:50 UTC
Obviously, if you call an external command it will make it more slower. My suggestion was to do it separately in a pre-process and thereafter initiate the script, so that it takes the same time.	[reply]
Re: how to remove a specific character from a file efficiently.. by RonW (Parson) on Jul 02, 2014 at 20:30 UTC
Are you saying that the input files now have new line characters in the middle of each record? Assuming that's the case, why not just: `$line =~ s/\n//g;` [download] And, as previously mentioned, if you can operate directly on $_ you will save the overhead of copying each line.	[reply] [d/l]