comment on

I'm not sure I fully understand your goal, but I think you do not want to try loading the path_n or path_p file into a hash.

If I read your post correctly, your script will have three inputs:

path_n, which contains many path strings, plus data about each path (total size about 2GB), with one "path+data" record per line
path_p, which is like path_n in size and structure, but contains some path strings not in common with path_n
dif_file, which contains a list of the path strings that are unique in the two path files

You didn't say, exactly, but I assume that your goal is to find the records in each path file that matches one of the unique path strings, and do something with the full content of those records -- maybe just output these records.

If I got that right, then the method you want is something like this:

read dif_file into a hash, using each path name as a hash key; you don't need to worry about what the hash value is -- you could do $diff_hash{$path} = undef;
open path_n, read it one line at a time, assign the path string to a variable, and see if %diff_hash contains an element with that path as the hash key; if so, print out the full record, otherwise go on to the next record
open path_p, and do the same thing you did for path_n

So the loop over the path_* files might look like this:

for my $file ( qw/path_n path_p/ ) {
   open PATH, $file or die "$file: $!";
   while (<PATH>)
   {
      # use a regex match with parens to capture path string in $1
      # and test to see if the path string was in dif_file:

      if ( /^(\S+)/ and exists( $diff_hash{$1} )) {
          print;
          # or whatever else you need to do with this record
      }
   }
   close PATH;
}
[download]

In reply to Re: huge file->hash by graff
in thread huge file->hash by ISAI student

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.