Another observation. When I add a "print;" before the push @arr2, I get a file that is 352602493 / 35048455 = 10.06 times larger than the input file. The first 10% seems to match the original and then there are more lines. I'm looking at what these lines correspond to.