Generally nested loops is a code smell. Nesting loops four deep goes beyond stinking to somewhere around putrid! For modest size files - say up to a few hundred megabytes for the smaller of them (the size of the second file doesn't matter) reading the smaller file into a hash and using that as a lookup is the preferred solution. Consider:

use strict; use warnings; my $inData1 = <<DATA1; G_00160 F_02571 G_00161 F_01082 G_00162 F_00034 G_00163 F_00035 G_00164 F_00036 DATA1 my $inData2 = <<DATA2; F_00013 G_06670 F_00034 G_00162 F_00035 G_00163 F_00036 G_00164 F_00038 G_00165 DATA2 my $outfile; open my $ur_ci, "<", \$inData1; my %urCi = map {chomp; split} <$ur_ci>; close $ur_ci; my %matches; open my $ci_ur, "<", \$inData2; while (<$ci_ur>) { chomp; my ($ci, $ur) = split; $matches{$ci} = $ur if exists $urCi{$ur} && $urCi{$ur} eq $ci; } print "$_ => $matches{$_}\n" for sort keys %matches;

Prints:

F_00034 => G_00162 F_00035 => G_00163 F_00036 => G_00164

If your input files are both rather larger than would easily fit in memory (more than 1/4 your memory size for the smallest) then you should really consider using a database. If this is a one off task SQLite may be a good choice. Consider:

use strict; use warnings; use DBI; my $inData1 = <<DATA1; G_00160 F_02571 G_00161 F_01082 G_00162 F_00034 G_00163 F_00035 G_00164 F_00036 DATA1 my $inData2 = <<DATA2; F_00013 G_06670 F_00034 G_00162 F_00035 G_00163 F_00036 G_00164 F_00038 G_00165 DATA2 unlink 'db.SQLite'; my $dbh = DBI->connect ("dbi:SQLite:dbname=db.SQLite","",""); $dbh->do ('CREATE TABLE urci (ci TEXT, ur TEXT)'); $dbh->do ('CREATE TABLE ciur (ur TEXT, ci TEXT)'); my $sth = $dbh->prepare ('INSERT INTO urci (ur, ci) VALUES (?, ?)'); open my $ur_ci, "<", \$inData1; $sth->execute (do {chomp; split}) while <$ur_ci>; close $ur_ci; $sth = $dbh->prepare ('INSERT INTO ciur (ci, ur) VALUES (?, ?)'); open my $ci_ur, "<", \$inData2; $sth->execute (do {chomp; split}) while <$ci_ur>; close $ci_ur; $sth = $dbh->prepare ( 'SELECT * FROM ciur INNER JOIN urci ON ciur.ci = urci.ci AND ciur. +ur = urci.ur' ); $sth->execute (); print "$_->{ci} => $_->{ur}\n" while $_ = $sth->fetchrow_hashref ();

Prints:

F_00034 => G_00162 F_00035 => G_00163 F_00036 => G_00164

True laziness is hard work

In reply to Re: Hash Comparisions by GrandFather
in thread Hash Comparisions by perl_n00b

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.