in reply to Re: RE on lines read from in-memory scalar is very slow (OOM variant)
in thread RE on lines read from in-memory scalar is very slow

Very interesting. I was able to reproduce the OOM error with my cygwin perl, and the two modifications you mentioned to avoid OOM worked for me also. Strangely, when I watch the perl process in the task manager, or just watch the total Memory usage it never varies. On my system the process reaches about 230 MB memory usage and the total memory stays at about 15.3 GB total over the whole execution. It seems that it might suddenly encounter a memory leak that happens so fast that the task manager doesn't detect it before the process dies. On my system the time between the start of the loop that produces the OOM and the exception is about 22 seconds.
  • Comment on Re^2: RE on lines read from in-memory scalar is very slow (OOM variant)

Replies are listed 'Best First'.
Re^3: RE on lines read from in-memory scalar is very slow (OOM variant)
by Danny (Chaplain) on Jan 24, 2024 at 03:33 UTC
    Another observation. When I add a "print;" before the push @arr2, I get a file that is 352602493 / 35048455 = 10.06 times larger than the input file. The first 10% seems to match the original and then there are more lines. I'm looking at what these lines correspond to.
      Strangely, if I strip my input file of carriage returns (s/\r//g) the file with printed lines is only 1.8% the size of the input file. This also takes about 22 seconds for the OOM after the start of the push @arr2 loop. Too weird.

      EDIT: I now do not think the carriage returns had anything to do with the file size. I've rerun the original file and the number of printed lines vary each time and are usually less than the input file. The first time I ran it the output happened to be 10x larger than the input but I haven't been able to reproduce this.

        See my comments regarding LF vs. CRLF.

        — Ken