First of all, I think ww is exactly right in this post with the suggestion to use a database. With proper indexing matching rows between tables will run very quickly indeed.

I get the idea that the approach is to try to move through all 3 data files, keeping careful track of your progress so that you don't lose your place. You might find it useful to write a subroutine that handles the searching through each file, keeping a pointer to the last place something was found.

The following is crude, and incomplete, but I hope gives some idea of what I'm suggesting. The hash %output uses the input file names as a key, and contains a reference to an array which will hold the lines that need to be written out to the corresponding output file. The hash %filepointers is used to keep track of the last spot in the input files where an account was found.

use strict; use warnings; open my $controlFileHandle,'<','control.csv'; my ($file1, $file2, $file3) = qw/file1.csv file2.csv file3.csv/; my (%output, %filepointers,$ofh,$infile,$line,$account,$accountline ); while ($accountline = <$controlFileHandle>) { $account = (split /,/,$accountline,2)[0]; lookForAccountInFile($account,$file1); lookForAccountInFile($account,$file2); lookForAccountInFile($account,$file3); } foreach $infile (keys %output) { open $ofh,'>',"new_$infile"; foreach $line (@{$output{$infile}}) { print $ofh $line } close $ofh; } sub lookForAccountInFile { my ($account,$file) = @_; open my $ifh,'<',$file; if (defined $filepointers{$file}) { seek $ifh, $filepointers{$file}, 0 } my $found = 0; while (my $line = <$ifh>) { last if ($line eq "\n"); my $la = (split /,/,$line,2)[0]; last if ($la > $account); if ($la == $account) { push @{$output{$file}},$line; $found = 1; } $filepointers{$file} = tell($ifh); last if ($found); } }

Updated to compile and run. I will show my test files below. To answer specifically how this approach helps with optimizing, 1) using a subroutine means you need only fine-tune the code once,and then gain the benefit as many times as you use it; 2) this reduces considerably the number of arrays and other variables. I did not use the CSV module since it appeared you only were using the very first column in each record. If you will need to explore the records more in depth, you will absolutely want to incorporate that module. It handles all sorts of special cases that would trip up this approach. One of your requirements that I did not implement was the account limit per file. But if I did everything, you'd have no opportunity to learn! ;)

Data: control.csv: 1,control,record,1 2,control,khrecord,2 5,control,recordi,5 7,control,record,7 file1.csv: 1,file,1,record,1 2,file,1,record,2 3,file,1,record,3 file2.csv: 2,record,1,file,2 4,record,2,file,2 5,record,3,file,2 6,record,4,file,2 7,record,5,file,2 file3.csv: 4,file,3,record,4 5,file,3,record,5
But God demonstrates His own love toward us, in that while we were yet sinners, Christ died for us. Romans 5:8 (NASB)


In reply to Re: Optimization of script by GotToBTru
in thread Optimization of script by JulioRD

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.