comment on

Greetings all.

So, I'm relatively new to Perl, and I've managed to screw up a Perl task that should be quite trivial. I have two files - one of ~16k records and another of ~240k records. I want to extract a string from each line of the first file(16k records), and subsequently check each record of the second file(~240k records) for a matching string. My take on it is below. Seems like it should work, but a couple of notes:

1. It doesn't work. My logic seems correct to me, but no match is found using an array of ids extracted from the 16k record file when looped against every element of the 240k record file. There should be ~16k matches. If I hard code one of the values from the array, a match is indeed found.

2. Even if it was working, it is no doubt horribly inefficient. I suspect I should use hashes somehow here, but I don't understand enough about those structures to implement them in this case.

So if anyone can shed some light as to where I have dropped the ball, I would really appreciate it. Any tips on how to make this script more efficient using hashes would be greatly appreciated as well. Style critique is appreciated also. Thanks!

#!/usr/bin/perl

use strict;
use warnings;

open (PIDS, $ARGV[0]);
open (FIDS, $ARGV[1]);
open (OUTPUT, ">$ARGV[2]");

my @pids = <PIDS>;
my @fids = <FIDS>;
my @pidCans;
my @fidCans;
my $result;
my $pidCan;
my $fid;
my $pid;

foreach $pid (@pids) {

   $result = '';
   $pid =~ /\|(.*)$/;
   $result = $1;
   push (@pidCans, $result);
}


foreach $pidCan (@pidCans)  {

  chomp $pidCan;

    foreach $fid (@fids)  {
      chomp $fid;
      print "Comparing $fid to $pidCan" . "\n";
         if ($fid =~ /$pidCan/) {
         print "FOUND A MATCH.\n";
         print OUTPUT $fid . "\n";
         }
   }
}

close PIDS;
close FIDS;
close OUTPUT;
[download]

In reply to My code sucks, please help me understand why. by mirage4d

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.