in reply to Re: Compare2Files LinebyLine
in thread Compare2Files LinebyLine

Thanks a ton. Can't imagine the feeling of having a reply from the great Merlyn. As always ur code is not only faster but neat and nice as well. However, my code doesn't go the O(n square) way, since i make atmost n comparisons (as the files are sorted first) i keep emptying the @temp array if u noticed. thanks again though.

Replies are listed 'Best First'.
Re: Re: Re: Compare2Files LinebyLine
by merlyn (Sage) on Sep 27, 2001 at 00:59 UTC
      Right as always :-)
      Hi Folks. Do you guys happen to have any suggestions for comparing 2 files line by line that don't involve loading all the lines into memory? I'm trying to compare two files that are each over 300MB in size. My system doesn't have enough memory to handle loading all the file lines into a hash. I've tried the readline approach but it takes forever to run. Unfortunately, I'm not able to load the data into a database either - even a Berkeley DB. Any ideas would be appreciated.

        There are ways of approaching the problem, but you need to state what it is that you are looking for in the comparison.

        Do you want to know which lines matched or which ones didn't?

        Are the files in a similar sequence with just additional lines or deleted or changed lines? Or do you need to know if any line in one file appears anywhere in the other?

        Depending on your answers, an algorithm appropriate maybe forthcoming.


        Examine what is said, not who speaks.

        The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.