in reply to 4-way interchange mapping

You sound like you want to store many many relationships in a large data structure. For anything of this size, that sort of setup doesn't scale well, and queries will get slower and slower as you have to scan across more and more groups of data to find the piece you are looking for.

This sort of storage is the forte of a database. I advocate mySQL as it is fast, well documented, and will save you a lot of time versus a flat file. This isn't a weakness of perl, it's a strength of indexed, and easily searchable, databases.

Good luck with your project.

    --jb

Replies are listed 'Best First'.
Re: Re: 4-way interchange mapping
by dsheroh (Monsignor) on May 09, 2002 at 22:57 UTC
    No, nothing many-to-many, it's all 1-to-1(-to-1-to-1). A given alternator may be part 12345 in the lester catalog, part 2-2342-01DU in the WAI catalog, part 340-87 in the pic catalog, and part 72 4223 to the manufacturer, but that's just 4 names for the same thing.

    I did consider a database, but this is a one-shot 'turn a lot of text files into one big text file' job, not an application which will need to maintain any of the data in the future. I'm just overengineering it so that, in 6 months when we have to submit interchange data to a different catalog vendor, there will be a decent chance of being able to just reuse it instead of writing new code each time. (Well, OK, and to make it a little more interesting, too.)

      Okay, what you could do is something like this... Create four hashes, each a relationship of their part number to another hash of "name" => "part_number" relationships. This will make looksups fast, and keep things organized. I've done with a naming scheme / eval strategy in the following code:
      use strict; my $WAI_cat; my $lester_cat; my $pic_cat; my $manu_cat; my $part_entry = { "lester" => "12345", "WAI" => "2-2342-01DU", "pic" => "340-87", "manu" => "72 4223" }; foreach(keys %$part_entry) { my $str= "\$\$".$_."_cat{'$$part_entry{$_}'} = \$part_entry"; #print $str.="\n"; eval($str); } print $$WAI_cat{"2-2342-01DU"}{lester}."\n"; print $$pic_cat{"340-87"}{manu}."\n";
      You can see that the hashes are named "category type"_cat to save us issues with assigning each by hand (the eval statement takes care of that cleanly). Building this allows you to look parts up by any category, and then you can reference through the hash to the other four parts.

      If you don't want to handle four actual hashes, you could probably manage something with a single hash with four named keys to a catalogues of hashes. Note that this solution probably isn't largely optimal on memory, but I don't know how many items in the database you will need to hold. Will this need to go to disk, or just run once? You could use Storable or Data::Dumper to freeze the whole structure to disk if you ever needed to use it again.

          --jb
        So, if I'm reading this correctly, you've created a hash for each catalog and an anonymous hash for the part, then inserted a reference to the anon hash into each catalog under the corresponding part number, yes? Slick... I like it and it seems like it should handle the piecemeal nature of the source data nicely as well. Thanks!

        As far as the memory/disk matters, it just needs to suck in source files of the form $lester\t$wai, correlate the numbers in all the files, and spew $lester\t$wai\t$pic\t$manu to stdout. From there, I'll capture it to a text file, send it to someone else, and forget about it until they either correct an input file or it comes time to update another catalog. I anticipate that the resulting data set will run around 12,000 records, so memory's not a concern.