Sorry for my error (24 bits not bytes). Here is some additional information. The files I'm working with aren't that large (about 2MB). Down the road they will be up to 500MB. I know I could put the data into a table and just use a sql query (with appropriate indexes). I'd like to explore using other techniques which don't use a database. Basically, I'm look for a programatic alternative to:
open INPUT, "$ENV{HOME}/flat_file" or die $!; my %ports; while (my $line = <INPUT>) { chomp $line; my ($code, $city) = split /\|/, $line; $ports{$code} = $city if not $ports{$code}; } for my $key (keys %ports) { if ($ARGV[1] =~ /$key/) { print $ports{$key}, "\n"; } } close INPUT;
Or the same using store/retrieve to avoid the repeated parsing:
use Storable qw(retrieve); my $ports = retrieve("$ENV{HOME}/flat_file.dat");
Last night I wrote a script which creates an index of the first three bytes, then saves it using Storable. The following program uses the index to print out the byte offset of ONLY AN EXACT MATCH:
#!/usr/bin/perl use strict; use Getopt::Std; my %parms; getopts ("c:p:", \%parms); die "Please supply a port code or city" if not $parms{p} and not $parms{c}; use Storable qw(retrieve); if ($parms{p}) { $parms{p} = uc $parms{p}; my $ports_by_code = retrieve("$ENV{HOME}/flat_file_by_port"); print $ports_by_code->{$parms{p}}, "\n"; } if ($parms{c}) { $parms{c} = uc $parms{c}; my $ports_by_city = retrieve("$ENV{HOME}/flat_file_by_city"); print $ports_by_city->{$parms{c}}, "\n"; }
I'd like to be able to handle more conditions. Say the user enters only the letter M. I would like to print out the first 25 matches that start with M. Or if they enter MV, the first 25 matches that start with MV. The max input lenght is 3 characters. My assumption (probably erroneous) is that I need three byte offset indexes to handle my requirements. Is there another programatic approach, besides DBI, or iterating over the entire list of keys and doing a contains/starts with search? I'm just looking for ideas. Any input is appreciated.

In reply to Re^4: index first 24 bytes of every line in file by mhearse
in thread index first 24 bits of every line in file by mhearse

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.