dsheroh has asked for the wisdom of the Perl Monks concerning the following question:

I've been asked to build a master part number interchange table and given a collection of files which each contain a list of part number equivalencies for two of four formats. I need to read all of these and build an output file which combines the information in all of the input files.

The actual files present all contain interchanges from the same catalog (lester) to another number, so the obvious solution for this case is to just build three hashes, one for each of the non-lester formats, and use the lester number as the key and the non-lester number as the associated value. I, however, feel compelled to over-engineer the project and not depend on the lester number's presence.

I'm currently thinking that the most sensible way to handle this would be to simply use an array of strings with 4 delimited fields in each string, one field for each catalog's number. They can then be separated easily enough using split, existing entries can be located on any catalog's number by grepping for a simple regex, a sort routine with the ability to sort them on any particular catalog's numbers could be written fairly easily, etc.

The only thing is, it feels like a pretty clumsy design. Anyone have suggestions of a better/cleaner way to do it?

Replies are listed 'Best First'.
Re: 4-way interchange mapping
by JayBonci (Curate) on May 09, 2002 at 21:24 UTC
    You sound like you want to store many many relationships in a large data structure. For anything of this size, that sort of setup doesn't scale well, and queries will get slower and slower as you have to scan across more and more groups of data to find the piece you are looking for.

    This sort of storage is the forte of a database. I advocate mySQL as it is fast, well documented, and will save you a lot of time versus a flat file. This isn't a weakness of perl, it's a strength of indexed, and easily searchable, databases.

    Good luck with your project.

        --jb
      No, nothing many-to-many, it's all 1-to-1(-to-1-to-1). A given alternator may be part 12345 in the lester catalog, part 2-2342-01DU in the WAI catalog, part 340-87 in the pic catalog, and part 72 4223 to the manufacturer, but that's just 4 names for the same thing.

      I did consider a database, but this is a one-shot 'turn a lot of text files into one big text file' job, not an application which will need to maintain any of the data in the future. I'm just overengineering it so that, in 6 months when we have to submit interchange data to a different catalog vendor, there will be a decent chance of being able to just reuse it instead of writing new code each time. (Well, OK, and to make it a little more interesting, too.)

        Okay, what you could do is something like this... Create four hashes, each a relationship of their part number to another hash of "name" => "part_number" relationships. This will make looksups fast, and keep things organized. I've done with a naming scheme / eval strategy in the following code:
        use strict; my $WAI_cat; my $lester_cat; my $pic_cat; my $manu_cat; my $part_entry = { "lester" => "12345", "WAI" => "2-2342-01DU", "pic" => "340-87", "manu" => "72 4223" }; foreach(keys %$part_entry) { my $str= "\$\$".$_."_cat{'$$part_entry{$_}'} = \$part_entry"; #print $str.="\n"; eval($str); } print $$WAI_cat{"2-2342-01DU"}{lester}."\n"; print $$pic_cat{"340-87"}{manu}."\n";
        You can see that the hashes are named "category type"_cat to save us issues with assigning each by hand (the eval statement takes care of that cleanly). Building this allows you to look parts up by any category, and then you can reference through the hash to the other four parts.

        If you don't want to handle four actual hashes, you could probably manage something with a single hash with four named keys to a catalogues of hashes. Note that this solution probably isn't largely optimal on memory, but I don't know how many items in the database you will need to hold. Will this need to go to disk, or just run once? You could use Storable or Data::Dumper to freeze the whole structure to disk if you ever needed to use it again.

            --jb
Re: 4-way interchange mapping
by graff (Chancellor) on May 09, 2002 at 22:02 UTC
    JayBonci is right -- something like mySQL is the best approach. But you say you were asked to "build an output file which combines all the information in all of the input files", so in that context, I think your idea would be hard to improve on, so long as you're careful to use an appropriate delimiter between the fields on each line (i.e. something that doesn't show up as data within any of the fields), and have a sensible way to deal with gaps -- a blank field in a given column for a given row.

    Also, if the patterns in different columns tend to be "confusable" (e.g. a search pattern for one type of field happens to match values in another field), you'll need an enhanced "grep" process that can limit the search to a particular field.