in reply to Reg: Performance

Nowadays 5 million entries is nothing!

Just load DUMP_B into a hash and then go over DUMP_A using the hash for fast lookup.

The unique number can have more than one account number

To cope with that you will have to use a hash of arrays:

# untested! my %b; open(B, "<" . DUMP_B) || die("Could not open file \n"); while (<B>) { chomp; my ($id, $ac) = split /\|/, $_; push @{$b{$id}}, $ac; } open(A, "<" . DUMP_A) || die("Could not open file \n"); while(<A>) { chomp; my $ac = $b{$_} || []; print "$_: ", join(', ', @$ac), "\n"; }

Replies are listed 'Best First'.
Re^2: Reg: Performance
by sivaraman (Initiate) on Oct 29, 2010 at 04:31 UTC

    Salva, This is working fine but prints all the matching and unmatching information in the console. Please help me that I want to print only the matches in to the FILE.

    Thank you.
Re^2: Reg: Performance
by sivaraman (Initiate) on Oct 28, 2010 at 10:12 UTC

    Thank you Salva. The problem is, it takes more time while write the output into the file. Please tell me, in your scenario how to write only the data which is matched into the file instead of into the console. - Thank you.

      from shell
      perl script.pl > file.txt
      or in code
      open OUT, '>', 'file.txt'; ... print OUT ...
Re^2: Reg: Performance
by sivaraman (Initiate) on Oct 29, 2010 at 06:34 UTC

    Thank you Salva. Now I have a code like

    my %b; while (DUMP_B) { chomp; my ($id, $ac) = split /\|/, $_; push @{$b{$id}}, $ac; } my $readID; while(defined($readID = DUMP_A)){ #chomp; $readID =~ s!\s+!!g; my $ac = $b{$readID} || []; #print "$readID: ", join(', ', @$ac), "\n"; my $arrSize = scalar @$ac; if($arrSize > 0 ){ for (my $i = 0; $i < $arrSsize ; $i++) { print OUTPUT "$readID|@$ac[$i] \n"; } } }
    This gives the expected output. Could you please explain me on 'my $ac = $b{$readID} || [];' Thank you once again and for all the Monks.

      $b{$readID} || []
      When there are no entries for $readID in DUMP_B, $b{$readID} becomes undef, appending || [] handles this special case replacing undef by a reference to an empty array so that we can dereference it later as @$ac.

      A simpler version of the code follows:

      my %b; open(B, "<" . DUMP_B) || die("Could not open file \n"); while (<B>) { chomp; my ($id, $ac) = split /\|/, $_; push @{$b{$id}}, $ac; } open( OUTPUT, '>', 'INACTIVE_LIST' ) || die "Could not write to file\n +"; open(A, "<" . DUMP_A) || die("Could not open file \n"); while(<A>) { chomp; s/\s+//g; if ($b{$_}) { for my $ac (@{$b{$_}}) { print OUTPUT "$_|$ac\n" } } }
      BTW, when programming in Perl it is very unusual to use C-like (or Java-like) for loops. Instead, for (@array) { ... } and for (0..$top) { ... } are used.

        Thank you Salva for your help and also for this clarification.

        - Thank you all Monks for spent your valuable time on this.