write to Disk instead of RAM without using modules

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: write to Disk instead of RAM without using modules by Corion (Patriarch) on Oct 21, 2016 at 11:07 UTC
Yes, see tie and easily Tie::File and DB_File. If you don't want to use a module, just take the code that these modules contain and use that code.	[reply]
Re: write to Disk instead of RAM without using modules by marto (Cardinal) on Oct 21, 2016 at 11:12 UTC
It's probably worth reading Yes, even you can use CPAN.	[reply]
Re: write to Disk instead of RAM without using modules by Laurent_R (Canon) on Oct 21, 2016 at 21:34 UTC
Do you really need to load all these files in memory at the same time? Maybe you can load only some and then others. You don't give enough information for us to figure out that, but think carefully about it, any solution writing to a disk is likely to be much slower. Otherwise, Data::Dumper, a standard core module, can stringify a data structure for storage into a file, and you can get back to the data structure with string `eval`. I just can't say if this will be fast enough, but, at least, this is a module that is there, you don't need to install it.	[reply] [d/l]
Re^2: write to Disk instead of RAM without using modules by Anonymous Monk on Oct 22, 2016 at 07:24 UTC
I need to compare all the files simultaneously then how can I load some files and then others. I couldn't understand.	[reply]
Re^3: write to Disk instead of RAM without using modules by Laurent_R (Canon) on Oct 22, 2016 at 09:11 UTC
I can't answer your question because you don't give enough details. But I am doing a lot of file comparisons at $work, most of the time with very large files. Various strategies permit to avoid loading all of them into memory. But a lot depends on the details. For example, are you looking for what we call "orphans", i.e. records that are in file 1 and not in file 2, or the other way around? Or are you rather looking for differences between records that have the same identifying key? Or both? Are you looking for common records, or are you looking for differences? The answer to this question may lead to an entirely different strategy. Sometimes, you can load just one file into memory and then scan the other files one by one and, for each file, line by line, without ever loading the other entire files into memory. And, as a second step, compare the generated files containing the differences between the other files and file 1, which may (or may not) be much smaller than the original files, depending on your data shape. Another approach (especially if the files are truly huge) is to sort the files according to the comparison key prior to the comparison and then read all of your files line by line in parallel. There is a penalty in sorting the files before the comparison, but it is often worth the cost, because the multifile comparison is then much faster. And, depending on where tour files are coming from, some of them may already be sorted. Each case is different, so that there is no general strategy blindly applicable to your specific problem, and this is why I can't suggest a solution without knowing in details what you're really comparing and what kind of differences (or common records) you're looking for.	[reply]
Re^4: write to Disk instead of RAM without using modules by Anonymous Monk on Oct 24, 2016 at 07:17 UTC
Re^5: write to Disk instead of RAM without using modules by hippo (Archbishop) on Oct 24, 2016 at 08:32 UTC
Re^5: write to Disk instead of RAM without using modules by Laurent_R (Canon) on Oct 24, 2016 at 17:10 UTC
Some notes below your chosen depth have not been shown here
Re^5: write to Disk instead of RAM without using modules by BrowserUk (Patriarch) on Oct 26, 2016 at 13:08 UTC
Re^3: write to Disk instead of RAM without using modules by afoken (Chancellor) on Oct 22, 2016 at 09:00 UTC
Show your existing code. Also, consider tools like diff, diff3, TortoiseMerge. Alexander -- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)	[reply]
Re: write to Disk instead of RAM without using modules by Laurent_R (Canon) on Oct 25, 2016 at 16:09 UTC
You're looking for records common to a group of files. Suppose you have one file containing the following words (one word per line): "two, four, six, seven", a second file containing "one, three, four, seven, eight", and a third file containing "two, four, seven, nine". Read the first file and store the words in a hash with a value 1. You get a hash looking like this: `( two => 1, four => 1, six => 1, seven => 1);` [download] Read the second file line by line, and, for each line, check if the line is in the hash. If it isn't there, just discard the line: it wasn't in the first file, it cannot be in all files. If it is in the hash, just increment the counter for it. Your end up with somehing like this: `( two => 1, four => 2, six => 1, seven => 2);` [download] Repeat the same process with the third file, and you get: `( two => 1, four => 3, six => 1, seven => 3);` [download] Notice that your hash is not growing, even though you've read three files. Only one file has ever been loaded into memory. Now you know that the records common to all files are those whose value is 3, you can just print them or do whatever you want with them.	[reply] [d/l] [select]
Re^2: write to Disk instead of RAM without using modules by Laurent_R (Canon) on Oct 25, 2016 at 16:25 UTC
This is quick perlish pseudo code for the process just explained in my post just above: `my %seen; open my $FH, "<", "file1.txt" or die "cannot open ..."; while (<$FH>) { chomp; $seen($_) = 1; } close $FH; for my $other_file (qw/ file2.txt file3.txt file4.txt .../) { open my $FH, "<", $other_file or die "cannot open ..."; while (<$FH>) { chomp; if (exists $seen{$_}) { $seen{$_}++; } } close $FH; }` [download] Only the first file is ever loaded into memory. When you read the other files, you just update counters. Even if you have hundreds of files, it will work provided the first file can be loaded into the hash. Update:: moved the second `chomp` line to the right place (within the `while` loop, not just before).	[reply] [d/l] [select]
Re^3: write to Disk instead of RAM without using modules by Anonymous Monk on Oct 26, 2016 at 05:06 UTC
Your suggestion will discard those records which are file specific i.e not present in first file but may be present in others. I want those records also to be printed with their value. for example: `file1: aaa 1 abc 1 acb 2 file2: aaa 2 abb 1 acb 1 file3: acb 2 aaa 3 abc 1 output: aaa 1 2 3 abc 1 0 1 acb 2 1 2 abb 0 1 0` [download]	[reply] [d/l]
Re^4: write to Disk instead of RAM without using modules by Laurent_R (Canon) on Oct 26, 2016 at 06:20 UTC
Re^4: write to Disk instead of RAM without using modules by Anonymous Monk on Oct 26, 2016 at 09:15 UTC
Re^4: write to Disk instead of RAM without using modules by Anonymous Monk on Oct 26, 2016 at 05:41 UTC
Re^4: write to Disk instead of RAM without using modules by Anonymous Monk on Oct 26, 2016 at 11:53 UTC
Re^5: write to Disk instead of RAM without using modules by Corion (Patriarch) on Oct 26, 2016 at 11:58 UTC
Re^5: write to Disk instead of RAM without using modules by haukex (Archbishop) on Oct 26, 2016 at 12:01 UTC
Re^5: write to Disk instead of RAM without using modules by Laurent_R (Canon) on Oct 26, 2016 at 22:31 UTC