Re: Query large tab delimited file by a list

Replies are listed 'Best First'.
Re^2: Query large tab delimited file by a list by Elninh05 (Novice) on Jul 03, 2016 at 16:06 UTC
Hi Marshall, thanks for the tip. Im new to perl and the community. But hope can improve my perl skills more and more.	[reply]
Re^3: Query large tab delimited file by a list by Marshall (Canon) on Jul 03, 2016 at 16:39 UTC
I see that file 1 is humongous (6 GB). How big is file 2? I guess output 1 is "extract records from file 1 that match an id in file2?". I am unclear as to the algorithm for output 2. Have you tried any code yet? If so post it and your thoughts on algorithms. Update: The size of file 2 matters in terms of whether this can be kept in memory or not. If so, this output 1 is relatively easy. If not, then some pre-sorting or a DB approach would be necessary.	[reply]
Re^4: Query large tab delimited file by a list by Elninh05 (Novice) on Jul 03, 2016 at 17:27 UTC
The first file is really big. The second file is about 200 MB with a single column (id). Yah, the script should extract records from file 1 that match an id in file 2. The second output should be a kind of statistics but is not important yet. I have about 32 GB of RAM and I would like to avoid the use of databases because I do not have any experiences with them. So Im new to perl but Im able to read in the file Formats into perl but however for extraction of the ids I have no clue yet. Would be glad if you can help. `Code #!/usr/bin/env perl use strict; use warnings; #Variable my $file1 = '/home//Desktop/file1.txt'; my $file2 = '/home/Desktop/file2.txt'; #Filehandle open( my $FH , '<', $file1 ) or die "Couldn't open file \"$file1\": $! +\n"; open( my $FH , '<', $file2 ) or die "Couldn't open file \"$file2\": $! +\n"; #Program for reading dbsnp my @file1_rows = split ("\t", $file1); ...` [download]	[reply] [d/l]
Re^5: Query large tab delimited file by a list by Marshall (Canon) on Jul 03, 2016 at 17:55 UTC
Re^4: Query large tab delimited file by a list by Elninh05 (Novice) on Jul 03, 2016 at 17:28 UTC
The first file is really big. The second file is about 200 MB with a single column (id). Yah, the script should extract records from file 1 that match an id in file 2. The second output should be a kind of statistics but is not important yet. I have about 32 GB of RAM and I would like to avoid the use of databases because I do not have any experiences with them. So Im new to perl but Im able to read in the file Formats into perl but however for extraction of the ids I have no clue yet. Would be glad if you can help. May be the use of hash keys is the right Approach??? `Code #!/usr/bin/env perl use strict; use warnings; #Variable my $file1 = '/home/Desktop/file1.txt'; my $file2 = '/home/Desktop/file2.txt'; #Filehandle open( my $FH , '<', $file1 ) or die "Couldn't open file \"$file1\": $! +\n"; open( my $FH , '<', $file2 ) or die "Couldn't open file \"$file2\": $! +\n"; #Program for comparison my @file1_rows = split ("\t", $file1); ...` [download]	[reply] [d/l]