in reply to Re: write to Disk instead of RAM without using modules
in thread write to Disk instead of RAM without using modules

This is quick perlish pseudo code for the process just explained in my post just above:
my %seen; open my $FH, "<", "file1.txt" or die "cannot open ..."; while (<$FH>) { chomp; $seen($_) = 1; } close $FH; for my $other_file (qw/ file2.txt file3.txt file4.txt .../) { open my $FH, "<", $other_file or die "cannot open ..."; while (<$FH>) { chomp; if (exists $seen{$_}) { $seen{$_}++; } } close $FH; }
Only the first file is ever loaded into memory. When you read the other files, you just update counters. Even if you have hundreds of files, it will work provided the first file can be loaded into the hash.

Update:: moved the second chomp line to the right place (within the while loop, not just before).

Replies are listed 'Best First'.
Re^3: write to Disk instead of RAM without using modules
by Anonymous Monk on Oct 26, 2016 at 05:06 UTC
    Your suggestion will discard those records which are file specific i.e not present in first file but may be present in others. I want those records also to be printed with their value. for example:
    file1: aaa 1 abc 1 acb 2 file2: aaa 2 abb 1 acb 1 file3: acb 2 aaa 3 abc 1 output: aaa 1 2 3 abc 1 0 1 acb 2 1 2 abb 0 1 0
      That's exactly why I asked several time for a detailed explanation of what you need. From your last message describing your requirement,your files you were just looking for records common to all files in our collection. Now your need is different.

      The method I suggested is still possible, but with a slight modification. When you compare all your files with the first one, write to disk the records that were not found in the hash. You'll end up with versions of all the other files with records from.the first file filtered out. At this point, the original %seen hash is no longer needed. Your can now compare the filtered file2 (presumably significantly smaller than the original one) with file3, file4, etc (also filtered and smaller), and so one. And you end up with a situation where your input file get smaller and smaller and, at any given point in the process, you only have one file in memory.

      You write stuff to disk, but the amount of data you need to handle is shrinking at each step in the process.

      I am very sorry but still not getting your point.
      I want to compare all files with each other and not first file with all other. Hope I am clear. may be the records are present in second and third file but not in first. I need them also.
      I tried with your solution but getting errors with script. I cannot solve the problem. please please help me with a short example. I have to get through this issue for further processing. Please help me and sorry for the inconvenience caused to you because of the unclear details earlier.

        Please help us help you better by showing us the code you have and the exact error messages you get. This enables us to much better pinpoint where you are having problems.

        Please note that this is not a code writing service. While we will try to help you understand what happens and try to teach you approaches to a solution, we expect you to program the solution yourself.

        Asking for "short examples" makes me feel like you expect us to do your paid work for free. This is not how this site works.

        Hi Anonymous,

        We can't help you with errors if you don't tell us what they are. I recommend you register an account and take some time to read How do I post a question effectively?. The advice therein will help you post better questions, which will result in us being able to help you much better and faster.

        Regards,
        -- Hauke D

        Please show the script that you tried and show the exact error message you're getting. I am afraid we can't help you further if you don't help us helping you.