Re: write to Disk instead of RAM without using modules

You're looking for records common to a group of files.

Suppose you have one file containing the following words (one word per line): "two, four, six, seven", a second file containing "one, three, four, seven, eight", and a third file containing "two, four, seven, nine".

Read the first file and store the words in a hash with a value 1. You get a hash looking like this:

( two => 1, four => 1, six => 1, seven => 1);
[download]

Read the second file line by line, and, for each line, check if the line is in the hash. If it isn't there, just discard the line: it wasn't in the first file, it cannot be in all files. If it is in the hash, just increment the counter for it. Your end up with somehing like this:

( two => 1, four => 2, six => 1, seven => 2);
[download]

Repeat the same process with the third file, and you get:

( two => 1, four => 3, six => 1, seven => 3);
[download]

Notice that your hash is not growing, even though you've read three files. Only one file has ever been loaded into memory. Now you know that the records common to all files are those whose value is 3, you can just print them or do whatever you want with them.

Comment on Re: write to Disk instead of RAM without using modules Select or Download Code

Replies are listed 'Best First'.
Re^2: write to Disk instead of RAM without using modules by Laurent_R (Canon) on Oct 25, 2016 at 16:25 UTC
This is quick perlish pseudo code for the process just explained in my post just above: `my %seen; open my $FH, "<", "file1.txt" or die "cannot open ..."; while (<$FH>) { chomp; $seen($_) = 1; } close $FH; for my $other_file (qw/ file2.txt file3.txt file4.txt .../) { open my $FH, "<", $other_file or die "cannot open ..."; while (<$FH>) { chomp; if (exists $seen{$_}) { $seen{$_}++; } } close $FH; }` [download] Only the first file is ever loaded into memory. When you read the other files, you just update counters. Even if you have hundreds of files, it will work provided the first file can be loaded into the hash. Update:: moved the second `chomp` line to the right place (within the `while` loop, not just before).	[reply] [d/l] [select]
Re^3: write to Disk instead of RAM without using modules by Anonymous Monk on Oct 26, 2016 at 05:06 UTC
Your suggestion will discard those records which are file specific i.e not present in first file but may be present in others. I want those records also to be printed with their value. for example: `file1: aaa 1 abc 1 acb 2 file2: aaa 2 abb 1 acb 1 file3: acb 2 aaa 3 abc 1 output: aaa 1 2 3 abc 1 0 1 acb 2 1 2 abb 0 1 0` [download]	[reply] [d/l]
Re^4: write to Disk instead of RAM without using modules by Laurent_R (Canon) on Oct 26, 2016 at 06:20 UTC
That's exactly why I asked several time for a detailed explanation of what you need. From your last message describing your requirement,your files you were just looking for records common to all files in our collection. Now your need is different. The method I suggested is still possible, but with a slight modification. When you compare all your files with the first one, write to disk the records that were not found in the hash. You'll end up with versions of all the other files with records from.the first file filtered out. At this point, the original %seen hash is no longer needed. Your can now compare the filtered file2 (presumably significantly smaller than the original one) with file3, file4, etc (also filtered and smaller), and so one. And you end up with a situation where your input file get smaller and smaller and, at any given point in the process, you only have one file in memory. You write stuff to disk, but the amount of data you need to handle is shrinking at each step in the process.	[reply]
Re^4: write to Disk instead of RAM without using modules by Anonymous Monk on Oct 26, 2016 at 09:15 UTC
I am very sorry but still not getting your point.	[reply]
Re^4: write to Disk instead of RAM without using modules by Anonymous Monk on Oct 26, 2016 at 05:41 UTC
I want to compare all files with each other and not first file with all other. Hope I am clear. may be the records are present in second and third file but not in first. I need them also.	[reply]
Re^4: write to Disk instead of RAM without using modules by Anonymous Monk on Oct 26, 2016 at 11:53 UTC
I tried with your solution but getting errors with script. I cannot solve the problem. please please help me with a short example. I have to get through this issue for further processing. Please help me and sorry for the inconvenience caused to you because of the unclear details earlier.	[reply]
Re^5: write to Disk instead of RAM without using modules by Corion (Patriarch) on Oct 26, 2016 at 11:58 UTC
Re^5: write to Disk instead of RAM without using modules by haukex (Archbishop) on Oct 26, 2016 at 12:01 UTC
Re^5: write to Disk instead of RAM without using modules by Laurent_R (Canon) on Oct 26, 2016 at 22:31 UTC