Ankur_kuls has asked for the wisdom of the Perl Monks concerning the following question:

I have two perl files which after processing produce a output file. these files contain records, end part of which look like below.

 \"version\":\"2.0\"}")]]]

Now for few days we are receiving new line character in the records like below which is causing failure.

 \"version\":\"2.0\"}\n")]]]

Now for this I used the below extra statemnet (4th one in below code) which is working perfectly

while(<AH>) { chomp; my $line=$_; $line =~ s/\\n\"\)]]]$/")]]]/g;

Now problem is, that these files contain millions of records & because of this extra command it is taking exactly double time (earlier 1.5 hrs now 3 hrs). Is there any other way to remove it in efficient way...Please help.

Replies are listed 'Best First'.
Re: how to remove a specific character from a file efficiently..
by ikegami (Patriarch) on Jul 02, 2014 at 05:03 UTC

    The /g is silly since it can't possibly find more than one match.

    The regex engine doesn't realize the match has to be at the end of the string, so it might help to only look there if the lines are really long.

    substr($line, -6, 6, "\")]]]") if substr($line, -6) eq "\n\")]]]";

    Not sure how it will affect you.

    But if your lines are so long that the regex match is slowing you down, perhaps you shouldn't make a needless copy of it. Get rid of my $line=$_;. Either work with $_, or use while (my $line = <AH>) to assign to $line instead of $_.

Re: how to remove a specific character from a file efficiently..
by jellisii2 (Hermit) on Jul 02, 2014 at 11:34 UTC
    This smells like an XY problem: The admittedly tiny piece of data looks like it may be JSON or something of the like. Is there a library you can use to parse the files so you don't have to manually mangle it? If so, while a rewrite may be required, you'll probably come out ahead in maintainability.
Re: how to remove a specific character from a file efficiently..
by gurpreetsingh13 (Scribe) on Jul 02, 2014 at 10:04 UTC
    Why not do that in a separate process instead of adding it in your script.
    perl -pi -e 's/\\n\"\)]]]$/")]]]/g' <filename>
    Either via a batch script or a shell script or on command line itself or a cron job.
      That will not make it any faster. It's very likely to make it slower.
        Obviously, if you call an external command it will make it more slower.
        My suggestion was to do it separately in a pre-process and thereafter initiate the script, so that it takes the same time.
Re: how to remove a specific character from a file efficiently..
by RonW (Parson) on Jul 02, 2014 at 20:30 UTC

    Are you saying that the input files now have new line characters in the middle of each record?

    Assuming that's the case, why not just:

    $line =~ s/\n//g;

    And, as previously mentioned, if you can operate directly on $_ you will save the overhead of copying each line.