Generally nested loops is a code smell. Nesting loops four deep goes beyond stinking to somewhere around putrid! For modest size files - say up to a few hundred megabytes for the smaller of them (the size of the second file doesn't matter) reading the smaller file into a hash and using that as a lookup is the preferred solution. Consider:
use strict; use warnings; my $inData1 = <<DATA1; G_00160 F_02571 G_00161 F_01082 G_00162 F_00034 G_00163 F_00035 G_00164 F_00036 DATA1 my $inData2 = <<DATA2; F_00013 G_06670 F_00034 G_00162 F_00035 G_00163 F_00036 G_00164 F_00038 G_00165 DATA2 my $outfile; open my $ur_ci, "<", \$inData1; my %urCi = map {chomp; split} <$ur_ci>; close $ur_ci; my %matches; open my $ci_ur, "<", \$inData2; while (<$ci_ur>) { chomp; my ($ci, $ur) = split; $matches{$ci} = $ur if exists $urCi{$ur} && $urCi{$ur} eq $ci; } print "$_ => $matches{$_}\n" for sort keys %matches;
Prints:
F_00034 => G_00162 F_00035 => G_00163 F_00036 => G_00164
If your input files are both rather larger than would easily fit in memory (more than 1/4 your memory size for the smallest) then you should really consider using a database. If this is a one off task SQLite may be a good choice. Consider:
use strict; use warnings; use DBI; my $inData1 = <<DATA1; G_00160 F_02571 G_00161 F_01082 G_00162 F_00034 G_00163 F_00035 G_00164 F_00036 DATA1 my $inData2 = <<DATA2; F_00013 G_06670 F_00034 G_00162 F_00035 G_00163 F_00036 G_00164 F_00038 G_00165 DATA2 unlink 'db.SQLite'; my $dbh = DBI->connect ("dbi:SQLite:dbname=db.SQLite","",""); $dbh->do ('CREATE TABLE urci (ci TEXT, ur TEXT)'); $dbh->do ('CREATE TABLE ciur (ur TEXT, ci TEXT)'); my $sth = $dbh->prepare ('INSERT INTO urci (ur, ci) VALUES (?, ?)'); open my $ur_ci, "<", \$inData1; $sth->execute (do {chomp; split}) while <$ur_ci>; close $ur_ci; $sth = $dbh->prepare ('INSERT INTO ciur (ci, ur) VALUES (?, ?)'); open my $ci_ur, "<", \$inData2; $sth->execute (do {chomp; split}) while <$ci_ur>; close $ci_ur; $sth = $dbh->prepare ( 'SELECT * FROM ciur INNER JOIN urci ON ciur.ci = urci.ci AND ciur. +ur = urci.ur' ); $sth->execute (); print "$_->{ci} => $_->{ur}\n" while $_ = $sth->fetchrow_hashref ();
Prints:
F_00034 => G_00162 F_00035 => G_00163 F_00036 => G_00164
In reply to Re: Hash Comparisions
by GrandFather
in thread Hash Comparisions
by perl_n00b
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |