in reply to Re^2: RE on lines read from in-memory scalar is very slow (OOM variant)
in thread RE on lines read from in-memory scalar is very slow

Another observation. When I add a "print;" before the push @arr2, I get a file that is 352602493 / 35048455 = 10.06 times larger than the input file. The first 10% seems to match the original and then there are more lines. I'm looking at what these lines correspond to.
  • Comment on Re^3: RE on lines read from in-memory scalar is very slow (OOM variant)

Replies are listed 'Best First'.
Re^4: RE on lines read from in-memory scalar is very slow (OOM variant)
by Danny (Chaplain) on Jan 24, 2024 at 03:47 UTC
    Strangely, if I strip my input file of carriage returns (s/\r//g) the file with printed lines is only 1.8% the size of the input file. This also takes about 22 seconds for the OOM after the start of the push @arr2 loop. Too weird.

    EDIT: I now do not think the carriage returns had anything to do with the file size. I've rerun the original file and the number of printed lines vary each time and are usually less than the input file. The first time I ran it the output happened to be 10x larger than the input but I haven't been able to reproduce this.

      See my comments regarding LF vs. CRLF.

      — Ken

        Just to clarify. The windows CRLF line endings has no relevance to this issue.