in reply to Re^2: use regular expressions across multiple lines from a very large input file
in thread use regular expressions across multiple lines from a very large input file

> In order to speed up the search, I dare to suggest to choose a large value or n, say a value slightly less than the amount that causes the "Out of Memory" error.

I think you mean half that size.

Cheers Rolf

  • Comment on Re^3: use regular expressions across multiple lines from a very large input file

Replies are listed 'Best First'.
Re^4: use regular expressions across multiple lines from a very large input file
by CountZero (Bishop) on Dec 06, 2010 at 18:56 UTC
    Why half the size?

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      One always will need to have two joined blocks of size n in memory, whenever there are no matches anymore which start in the front block one will have to delete it, treat the rear block as new front and read a new rear block from disk.

      loaded Blocks |------[++++++|++++++]------|------|---| file A B C D E F <-----> match

      Actually I'm not sure if joining two strings can be done without needing twice as much memory!

      Anyway I don't think going to extremes is a good idea...

      Cheers Rolf

        OK. I may not have expressed myself clearly enough. My idea was to run the program with a huge block size which will cause an out-of-memory error and then reducing the block size until it no longer errors out. That would automatically take care of the joining of strings and its additional memory use.

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James