in reply to Searching multiple expressions in multiple Files

Please post some code. It is very hard to know what might be your problem without seeing how you are solving the problem.

Opening and closing files is very slow - one mistake people often make is to grab a string, read all 4000 files, grab another string, read all 4000 files, and so on. (1,200,000 file openings!). You can usually avoid this by reading through each file only once and saving the data you need in a hash.

How you go about reading in and saving the data depends a lot on the nature of the strings you are searching for. Are your strings whole words or sequences that can be found in the middle of words and/or spread across several words?

Best, beth

  • Comment on Re: Searching multiple expressions in multiple Files

Replies are listed 'Best First'.
Re^2: Searching multiple expressions in multiple Files
by VinsWorldcom (Prior) on Mar 31, 2009 at 12:09 UTC

    To add on to Beth's idea, you'll need to open the 4000 files at some point, it may be easiet to read in the 100-300 strings, cache them in a structure (using less memory) and then iterate over the 4000 files - 1 time each - looking for all ~300 strings?

Re^2: Searching multiple expressions in multiple Files
by Anonymous Monk on Mar 31, 2009 at 13:13 UTC
    The format of the string goes something like this "abcdefg01.123" in all the files this string is preceeded by blank space and followed by a ";".

    So the string is actually a combination of two words.