comment on

What kind of memory structure is appropriate depends upon what you are going to do with the data (how you intend to process it). It could be helpful if you could explain a bit about that. The only reason to keep all 100 files in memory is if there is some connection between the data in the files. Otherwise you can just process each file individually, one at a time. Of course if these files are big, storing them all in memory at the same time is going to take a lot of memory!

When you build the memory structure, some initial processing (like maybe splitting out the important data fields 1, 3, 8 10) is usually appropriate rather than storing a verbatim copy of the line from the file.

Small note:

  
open FILE, $filename ||
        die "Cannot open $filename for reading: $!\n";
#due to precedence rules, if you use the ||
#parens are needed
open (FILE, '<', $filename) ||
        die "Cannot open $filename for reading: $!\n";
#or use the lower precedence "or"
open FILE, '<', $filename or
        die "Cannot open $filename for reading: $!\n";
[download]

Update: if you are just grepping for certain lines, consider using the command line grep to get the lines of interest. The file system will do some file caching - the 2nd , 3rd grep will speed up. Whats "best" depends upon how many searches you are going to do. Or perhaps if you are always searching for just one field, a hash data structure keyed on that field may be appropriate.

I do have one application that uses 700-900 flat files as a database. Each file is 100's to a few thousands of lines. On my Win XP system, the first linear search takes 7 seconds (open each file, read each line, etc). After that, subsequent searches take <1 second due to file system caching. Results get displayed as they arrive in a Tk UI. Average user session is 1-2 hours. Nobody has ever even noticed that the first search is "slow" or that it speeds up - the results spew out so fast that the human can't process them as fast as they come. I am converting this to an SQLite DB and it will be even faster, but the point is: try the simple stuff first and see how it goes before trying to optimize. In this case, all the files become memory resident without me having to do anything at all and I have very simple code that doesn't use much memory in my process space. Just a thought.

In reply to Re: slurping many Arrays into memory...Please Help by Marshall
in thread slurping many Arrays into memory...Please Help by david_lyon

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.