in reply to Question about properly laying out a database

I'd suggest that you have functions to perform each search, returning an array of matches. Then, just do a quick join of all your keys that match the search. The real question is "How do you make optimal search functions?"

To get the fastest, most foolproof solution, you should normalize the database. Rather than having vehicle entries like this:
my %cars = ( 1001 => { make => "Chevrolet", model => "Camaro", year => 1972, price => 6000, }, ...
You should have this:
my %cars = ( 1002 => { make_id => 10, model_id => 27, year => 2001, price => 24000, }, ); my %makes = { 1 = "Chevrolet", 10 = "Acura", }; my %models = { 12 = "Camaro", 27 = "RSX", };
Then, your "make" and "model" search functions are doing fast, numerical comparisons, and not string matches. And you're guaranteed to have consistent spelling of Chevrolet! You never have to match /chev(y|rolet)/i!

You'll probably want other fields, like "description", "series", "miles", etc.

You'll also want things like "show_price" and "show_miles". If that 1972 Camaro has 217,000 miles, no one is gonna look at it at $6,000, even if it's puff! So, just don't show the price -- and get the customer in there to see how well it's been restored.

Other thoughts: Might Storable be a better way to go than DB_File, monks?

There's no way for you to go with MySQL, huh? :)

I've got just a *little* experience with exactly this stuff <dripping sarcasm> - 1 2. /msg me if you need more help.

Replies are listed 'Best First'.
Re: Re: Question about properly laying out a database
by perrin (Chancellor) on Dec 12, 2001 at 04:08 UTC
    No, I don't think Storable would be a better option for this. With Storable, you would have to load the entire database into memory just to search it. DB_File creates fast, indexed access to records without actually loading them into RAM.

    However, your suggestion of getting an array of matches for each criterion and then finding the overlapping set is a good one. You can make one DB_File database for each criterion (make.db, model.db, etc.) and then the content for each record could be a list of car IDs (like unique object IDs) that you use to look up the car data in a separate content database (also a DB_File, with data serialized using Storable in each record).

    However, it would be much easier to just use MySQL.

      But if he's going to search all the records to match on some set of criteria, won't he have to load the whole thing in memory anyway?

      In this application, it's not very often that you call up one record by its key id. Searches are more common. So, you'll have to evaluate just how much data each record will hold, how much RAM will be used up, and how quickly you can load all that data into memory.
        But if he's going to search all the records to match on some set of criteria, won't he have to load the whole thing in memory anyway?

        No. All he has to do to find the cars with make = 'volvo' (or some normalized key like 'make_7') is say my $cars = $make_db{'volvo'} and have that return a ref to an array of car IDs (serialized by Storable). Then he does the same for each of the other criteria, and finds the overlapping set.

        You do access records by ID, because you make multiple indexes (dbm files) which are each using a different criterion of the search as a key. It's kind of a roll-your-own MySQL.

Re(2): Question about properly laying out a database
by dmmiller2k (Chaplain) on Dec 13, 2001 at 01:13 UTC

    Given joelba's excellent suggestion, it may also help to create some indexes to speed up subsequent searches. Recall, in "Re: Question about properly laying out a database," above:

    my %cars = ( 1002 => { make_id => 10, model_id => 27, year => 2001, price => 24000, }, ); my %makes = { 1 = "Chevrolet", 10 = "Acura", }; my %models = { 12 = "Camaro", 27 = "RSX", };

    For example, indexing Makes might look like:

    my %makes_idx = ( 1 => [1001, ... ], # cars that are Chevys 10 => [1002, ...], # cars that are Acuras ... );

    Or, indexing Years might look like:

    my %years_idx = ( 2001 => [ 1001, 1002, ... ], # cars that are 2001 models ... );

    You get the idea.

    Then searching by any single attribute that is indexed becomes nearly instantaneous. Doing a "JOIN" on two (or more) indexed attributes looks something like this:

    # For example, finding all 2001 or newer Chevys my %temp_hash = reverse %makes; # gives the id for each make my $make_id = $temp_hash{'Chevrolet'); # get the id for Chevy my %make_set = map { $_=>1 } @{$makes_idx{$make_id}}; # set of cars t +hat are Chevys # search above for 2001 or newer models my @temp_years = grep { $_ >= 2001 } keys %years_idx; my @result_ids = grep { exists $make_set{$_} } @years_idx{@temp_years} +;
    Update: Changed search example to look for a range rather than a specific year.

    dmm

    You can give a man a fish and feed him for a day ...
    Or, you can
    teach him to fish and feed him for a lifetime
      Hmm.. My gut reaction at first was "Well, it's a relatively small amount of data. And, you'll wind up recalculating the indexes for each search anyway. Great idea, but is it worth the effort here?"

      Then, I got a message from Stamp_Guy saying that he cannot use DB_File because complex data structures aren't supported. (C'mon.. how complex is a HoH?) So, now I am thoroughly convinced that Storable is a Good Idea, as long as the number of records doesn't get out of hand. And if you use Storable, the indexes can be stored objects too -- so the overhead of generating the indexes is gone from the search program. Sweet!

      So, here's the quick and dirty:
      %cars - Storable HoH as illustrated above, with any additional vehicle info added too

      %makes - Storable hash of make_id => make. This should probably be a complete list of makes, which you can find on cars.com.

      %models - Storable HoH of
      model_id => { model => 'modelname', make_id => make_id (key from %makes) }
      . You can build this make_id hash as your inventory grows. Be sure to leave off the model details/series. Ex. model => 330 should be in this hash to cover all BMW 330's. You don't necessarily need to have one entry for 330xi, 330ci, etc. Lexus is another pain in the rear when it comes to this model naming scheme.

      • Then, your indexes: %years_idx, %make_idx, %model_idx. Obviously you will need to recreate your indexes each time you make a change to the dealer's inventory. A minor race condition exists here if someone searches while you are updating the inventory, but I'll leave that for you to solve. :)