in reply to inefficient code? works on small files ok but not larger ones

This is a good example of what people mean when they say, "That solution doesn't scale well.

The problem is that the algorithm you've devised to partition the data by range runs in something like O(n^3) time (or worse), which means that as your dataset grows by n elements, the work being done grows by at least n^3 (again, it might be worse in final analysis). You have a construct of four 'for' loops nested, and then two 'for' loops nested later.

As the dataset grows, you go from a relatively small number of iterations, to billions of iterations, rather quickly, and the program appears to grind to a halt -- in reality it's working really really hard.

Couldn't you predeclare your ranges, and then in a single loop test which ranges each line of data fits into?

Your next problem is that you're slurping the entire file. That means that you must hold the entire 22mb file in memory. Plus, the array you're generating must fit in memory too. You're probably getting to a point where your operating system cries uncle and starts using hard-disk swap space. With a lot of swapfile churning, you also get dreadfully slow performance. It would work a lot better if you could read your data in line by line and process it, one line at a time.


Dave

  • Comment on Re: inefficient code? works on small files ok but not larger ones