Provided the size of your datafile1 isn't too large, you might be able to trade memory for speed and build your hash deeper, which should reduce the number of keys at each level and speed the processing by reducing the number fo iterations at the inner levels.
The structure the code below builds looks like this.
$VAR1 = { '200534011' => { '200577234' => { 'CDC Inc.' => { '2' => 'so +me text description' } } }, '200110014' => { '200110325' => { 'CDC Inc.' => { '1' => 'so +me text description' } } }, '199989987' => { '199999991' => { 'CDC Inc.' => { '4' => 'so +me text description' } } }, '200323021' => { '200331234' => { 'ABC Corp.' => { '3' => 's +ome text description' } } }, '200212344' => { '200232399' => { 'CDC Inc.' => { '3' => 'so +me text description' } } }, '200210014' => { '200210105' => { 'XYZ Ltd.' => { '1' => 'so +me text description' }, 'ABC Corp.' => { '1' => 's +ome text description' } } }, '200211011' => { '200212053' => { 'XYZ Ltd.' => { '2' => 'so +me text description' }, 'ABC Corp.' => { '2' => 's +ome text description' } } } };
You could probably simplify %corps somewhat at the inner levels, depending on what information you need available once you have matched a record.
As pointed out above, you don't say what you want in your output file, so I've just dumped everything that matches each input record.
The output
C:\test>226719 200110100,some text here,etc matches with: CDC Inc. 1 -some text description 200918943,some text here,etc matches with: 200211015,some text here,etc matches with: XYZ Ltd. 2 -some text description ABC Corp. 2 -some text description 199212395,some text here,etc matches with: 200110100,some text here,etc matches with: CDC Inc. 1 -some text description 200210100,some text here,etc matches with: XYZ Ltd. 1 -some text description ABC Corp. 1 -some text description C:\test>
The code
use strict; use Inline::Files; use Data::Dumper; my %corps; while (<DATAFILE1>) { chomp; my $name = $_; <DATAFILE1>; while(<DATAFILE1>) { chomp; last if /^\s*$/; my ($prefix, $start, $end, $text) = /^(\d+)\s+(\d+)\s+(\d+)\s ++(.*$)/; $corps{$start}{$end}{$name}{$prefix} = $text; } } #!print Dumper \%corps; <DATAFILE2>; #! Skip header while (<DATAFILE2>) { chomp; print "\n$_ matches with:"; my ($id, @rest) = /(\d+),(.*$)/; for my $start ( grep{ $_ <= $id } keys %corps ) { for my $end ( grep{ $_ >= $id } keys %{$corps{$start}} ) { for my $name (keys %{$corps{$start}{$end}} ) { for my $prefix (keys %{$corps{$start}{$end}{$name}} ) +{ print "\t$name $prefix -", $corps{$start}{$end}{$n +ame}{$prefix}; } } } } } __END__
Examine what is said, not who speaks.
The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.
In reply to Re: Code efficiency / algorithm
by BrowserUk
in thread Code efficiency / algorithm
by dave8775
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |