in reply to Compare CSVs FILES using REGEX or pattern matching

As previously said by other monks, you don't provide enough information, but depending on the approximate ratio of lines that get updated between two runs, you might just start by comparing the full lines and decide to split the lines and compare the individual columns only for those lines which are different.

Otherwise, we don't have enough details about your procedure and your data, but, in general, comparing 20,000 lines in less than 2 minutes seems to be a very realistic aim (if coded reasonably efficiently). I am quite often comparing 30 million lines in 10 to 15 minutes or even less if the comparison to be performed is simple or the lines relatively short, on a platform which is far from being a racing horse.

Finally, as already pointed out, if you open your files in read mode, there is no danger to alter them. But show your code to confirm this as well as my previous (quite general) comments.

  • Comment on Re: Compare CSVs FILES using REGEX or pattern matching