in reply to Re^2: Populate and sort AoA
in thread Populate and sort AoA

There is indeed a reason I go to all that trouble.

The serial numbers are recieved in random order, up to 20K of them. I want the 'operater' to know what were the last 20 serial numbers received (last-in at the top of the list) and what's still missing from the 'group, A-D'. There a section of the script that reads from a database to get serial-number groups that may have been read much earlier (no longer on the last 20 list).

I could simply get the display data directly from the database, but I concluded that selecting and sorting from 20K records would be time consuming so I decided to maintain a 'last-20' array (or hash) instead.

Tanktalus, I tried out your hash idea and got it working. I'm not sure which is better. Capturing the serial numbers is easier, but the display gets pretty messy.

I guess it's a toss up :)

Replies are listed 'Best First'.
Re^4: Populate and sort AoA
by Tanktalus (Canon) on Nov 15, 2005 at 04:43 UTC

    Ok, first I tried to look at the problem from a number of angles. I started by suggesting that you could use a tree (e.g., Tree::Binary) to store things - but the problem is that you want to look up data in multiple ways: by serial number, and by last-in. Look-up by serial number is a hash lookup, while look-up by last-in is an ordered list (ordered on the time of the insert). So then I was about to suggest storing your data simultaneously in multiple containers (since it's all references anyway, it's not going to take up that much space). But then, my fingers and mind got all twisted up trying to describe it.

    That's a good indication that either it's a bad idea, or I don't understand the idea well enough to teach someone. I've been introduced to the idea before, but I've never actually used multiple containers for a single set of data before. So my experience just isn't great enough to really describe it. Then the epiphany.

    You could simply get the display data directly from the database. Set indexes on the serial numbers and the last-insert-time fields. If you get multiple inserts per second, you may need a sub-second time field (I'm not sure if that's standard in RDBMSes or not - I pretty much only use DB2). If you still get multiple inserts simultaneously, it's not really that bad of a thing anyway. You can then query directly for any serial number you want (the database will use a b-tree for index lookup - for 20,000 records, that will mean approximately 14.3 lookups on average), and also query for the 20 highest insert times. Let the database worry about it. It'll use the best algorithms that it can - which you can help by setting up proper indexes.

    Basically, you already have a powerful tool for this, and you're already eating most of the overhead of doing this with that tool. Take advantage of that tool for the rest of it :-)

      Thanks for your thoughts and the long write up.
      As noted in my earlier post, I had considered getting the display data directly from the database table. For your reference, the timestamp will be ok in seconds

      sudo-code #ie: #Read and save serial number to database. #Table: | prefix | number | stamp | (index the stamp) #Now build for the display... "SELECT DISTINCT number FROM table ORDER BY stamp LIMIT 20" etc; ## Now get prefixes associated with those numbers. ## Remember the other serial numbers (prefixes) may be received at any + time in the past. for (each number) { "SELECT prefix FROM table WHERE number=$_" fetch() 1 to 4 elements per number { ## build, in a string, array, or hash for display "number XX XX XX XX" } }

      And that looks like a nightmare. :)

      update
      Tanktalus, don't spend any more time on this. There's more to the script than I've outlined, like some of this runs in a forked child process and data is LWP'd to another script, etc.

      Your earlier solutions will serve my needs and I sincerely thank you.