Re: write to Disk instead of RAM without using modules

Replies are listed 'Best First'.
Re^2: write to Disk instead of RAM without using modules by Anonymous Monk on Oct 22, 2016 at 07:24 UTC
I need to compare all the files simultaneously then how can I load some files and then others. I couldn't understand.	[reply]
Re^3: write to Disk instead of RAM without using modules by Laurent_R (Canon) on Oct 22, 2016 at 09:11 UTC
I can't answer your question because you don't give enough details. But I am doing a lot of file comparisons at $work, most of the time with very large files. Various strategies permit to avoid loading all of them into memory. But a lot depends on the details. For example, are you looking for what we call "orphans", i.e. records that are in file 1 and not in file 2, or the other way around? Or are you rather looking for differences between records that have the same identifying key? Or both? Are you looking for common records, or are you looking for differences? The answer to this question may lead to an entirely different strategy. Sometimes, you can load just one file into memory and then scan the other files one by one and, for each file, line by line, without ever loading the other entire files into memory. And, as a second step, compare the generated files containing the differences between the other files and file 1, which may (or may not) be much smaller than the original files, depending on your data shape. Another approach (especially if the files are truly huge) is to sort the files according to the comparison key prior to the comparison and then read all of your files line by line in parallel. There is a penalty in sorting the files before the comparison, but it is often worth the cost, because the multifile comparison is then much faster. And, depending on where tour files are coming from, some of them may already be sorted. Each case is different, so that there is no general strategy blindly applicable to your specific problem, and this is why I can't suggest a solution without knowing in details what you're really comparing and what kind of differences (or common records) you're looking for.	[reply]
Re^4: write to Disk instead of RAM without using modules by Anonymous Monk on Oct 24, 2016 at 07:17 UTC
I have multiple fastq files in the following format. I want to print the total count if the second line i.e the sequence matches in all files. `R1.txt @NS500278:42:HC7M3AFXX:3:21604:26458:18476 2:N:0:AGTGGTCA AAAAAAAAACAGATATTTGCACTAGGCATTATAAATAACATCAATTAAGTAAAAAAATTA + AAAAAEEEEAEEEEEEEEEE/AEEEEEEEEEEEE 1:R1.txt` [download] `R2.txt @NS500278:42:HC7M3AFXX:3:21604:26458:18476 2:N:0:AGTGGTCA AAAAAAAAACAGATATTTGCACTAGGCATTATAAATAACATCAATTAAGTAAAAAAATTA + AAAAAEEEEAEEEEEEEEEE 1:R2.txt` [download] `The output I want is: output @NS500278:42:HC7M3AFXX:3:21604:26458:18476 2:N:0:AGTGGTCA AAAAAAAAACAGATATTTGCACTAGGCATTATAAATAACATCAATTAAGTAAAAAAATTA + AAAAAEEEEAEEEEEEEEEE/AEEEEEEEEEEEE 1:R1.txt 1:R2.txt count:2` [download] My code is: #!/usr/bin/env perl use strict; use warnings; no warnings qw( numeric ); my %seen; $/ = ""; while (<>) { chomp; my ($key, $value) = split ('\t', $_); my @lines = split /\n/, $key; my $key1 = $lines[1]; $seen{$key1} //= [ $key ]; push (@{$seen{$key1}}, $value); } foreach my $key1 ( sort keys %seen ) { my $tot = 0; my $file_count = @ARGV; for my $val ( @{$seen{$key1}} ) { $tot += ( split /:/, $val )[0]; } if ( @{ $seen{$key1} } >= $file_count) { print join( "\t", @{$seen{$key1}}); print "\tcount:". $tot."\n\n"; } } [download] This is working well with some files but when I compare more files it hangs. I think it is because of memory issue. Without using any modules I want to modify this script so that it can work with any number of files.	[reply] [d/l] [select]
Re^5: write to Disk instead of RAM without using modules by hippo (Archbishop) on Oct 24, 2016 at 08:32 UTC
Re^5: write to Disk instead of RAM without using modules by Laurent_R (Canon) on Oct 24, 2016 at 17:10 UTC
Re^6: write to Disk instead of RAM without using modules by Anonymous Monk on Oct 25, 2016 at 07:54 UTC
Some notes below your chosen depth have not been shown here
Re^5: write to Disk instead of RAM without using modules by BrowserUk (Patriarch) on Oct 26, 2016 at 13:08 UTC
Re^3: write to Disk instead of RAM without using modules by afoken (Chancellor) on Oct 22, 2016 at 09:00 UTC
Show your existing code. Also, consider tools like diff, diff3, TortoiseMerge. Alexander -- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)	[reply]