Provided the size of your datafile1 isn't too large, you might be able to trade memory for speed and build your hash deeper, which should reduce the number of keys at each level and speed the processing by reducing the number fo iterations at the inner levels.

The structure the code below builds looks like this.

$VAR1 = { '200534011' => { '200577234' => { 'CDC Inc.' => { '2' => 'so +me text description' } } }, '200110014' => { '200110325' => { 'CDC Inc.' => { '1' => 'so +me text description' } } }, '199989987' => { '199999991' => { 'CDC Inc.' => { '4' => 'so +me text description' } } }, '200323021' => { '200331234' => { 'ABC Corp.' => { '3' => 's +ome text description' } } }, '200212344' => { '200232399' => { 'CDC Inc.' => { '3' => 'so +me text description' } } }, '200210014' => { '200210105' => { 'XYZ Ltd.' => { '1' => 'so +me text description' }, 'ABC Corp.' => { '1' => 's +ome text description' } } }, '200211011' => { '200212053' => { 'XYZ Ltd.' => { '2' => 'so +me text description' }, 'ABC Corp.' => { '2' => 's +ome text description' } } } };

You could probably simplify %corps somewhat at the inner levels, depending on what information you need available once you have matched a record.

As pointed out above, you don't say what you want in your output file, so I've just dumped everything that matches each input record.

The output

C:\test>226719 200110100,some text here,etc matches with: CDC Inc. 1 -some text description 200918943,some text here,etc matches with: 200211015,some text here,etc matches with: XYZ Ltd. 2 -some text description ABC Corp. 2 -some text description 199212395,some text here,etc matches with: 200110100,some text here,etc matches with: CDC Inc. 1 -some text description 200210100,some text here,etc matches with: XYZ Ltd. 1 -some text description ABC Corp. 1 -some text description C:\test>

The code

use strict; use Inline::Files; use Data::Dumper; my %corps; while (<DATAFILE1>) { chomp; my $name = $_; <DATAFILE1>; while(<DATAFILE1>) { chomp; last if /^\s*$/; my ($prefix, $start, $end, $text) = /^(\d+)\s+(\d+)\s+(\d+)\s ++(.*$)/; $corps{$start}{$end}{$name}{$prefix} = $text; } } #!print Dumper \%corps; <DATAFILE2>; #! Skip header while (<DATAFILE2>) { chomp; print "\n$_ matches with:"; my ($id, @rest) = /(\d+),(.*$)/; for my $start ( grep{ $_ <= $id } keys %corps ) { for my $end ( grep{ $_ >= $id } keys %{$corps{$start}} ) { for my $name (keys %{$corps{$start}{$end}} ) { for my $prefix (keys %{$corps{$start}{$end}{$name}} ) +{ print "\t$name $prefix -", $corps{$start}{$end}{$n +ame}{$prefix}; } } } } } __END__

Examine what is said, not who speaks.

The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.


In reply to Re: Code efficiency / algorithm by BrowserUk
in thread Code efficiency / algorithm by dave8775

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.