Re: load a file in memory and extract parts

In general, when you want to use keys from one file to lookup values in another file, you load the keys from one file into a hash as its keys, then loop through the other file checking to see if each line's key exists in the hash, and doing something with it if it is. Unless there's a reason to do otherwise, it's usually best to load the smaller file (in this case your 5K one) into the hash, then loop through the other file. So in pseudo-code:

open 5k file
foreach line
  get key from line and put it in hash as key=1
close 5k file

open 100M file
foreach line
  get key from line
  if key is in hash from other file
    do stuff with the line
close 100M file
[download]

Once you have some code which attempts to do that, show it to us along with a few lines of sample input and output data, and we can guide you further if you need it.

Aaron B.
Available for small or large Perl jobs and *nix system administration; see my home node.

Comment on Re: load a file in memory and extract parts Download Code

Replies are listed 'Best First'.
Re^2: load a file in memory and extract parts by afoken (Chancellor) on May 06, 2015 at 06:10 UTC
Tux seems to be offline, so I'll link to Text::CSV_XS: Text::CSV_XS takes care of reading and writing CSV files. Unlike most "five lines of perl" attempts, it handles most, if not all, nasty edge cases. DBD::CSV sits on top of Text::CSV_XS and allows SQL access to CSV files. It may be slower than SQLite proposed by locked_user sundialsvc4, but avoids converting CSV to SQLite. Alexander -- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)	[reply]

Replies are listed 'Best First'.

Re^2: load a file in memory and extract parts
by afoken (Chancellor) on May 06, 2015 at 06:10 UTC

Tux seems to be offline, so I'll link to Text::CSV_XS:

Text::CSV_XS takes care of reading and writing CSV files. Unlike most "five lines of perl" attempts, it handles most, if not all, nasty edge cases.
DBD::CSV sits on top of Text::CSV_XS and allows SQL access to CSV files. It may be slower than SQLite proposed by locked_user sundialsvc4, but avoids converting CSV to SQLite.

Alexander

--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

[reply]