in reply to Re: More than one way to skin an architecture
in thread More than one way to skin an architecture

Thanks for the sentiment. The project has started to take on a life of it's own. My friends and co-workers are the encouragement here.

You make a very good point about performance being something to optimize after you have at least a prototype - I guess that's the nature of iterative development after all - but I was trying to think my way through this one first. (Not my usual technique, I admit). Since I'm not too skilled yet at hashes and managing them, I'm a tad leery of what the query/select statements will look like and what_not. I know to bind columns to a csv table and query from that; not sure what the equivalent construct would be for a hash. Any thoughts there?

I like the Tie idea. The module seems reasonable enough for a novice like me.

As for optimization after the prototyping, well, there are the Monks, aren't there? ;)

  • Comment on Re^2: More than one way to skin an architecture

Replies are listed 'Best First'.
Re^3: More than one way to skin an architecture
by plobsing (Friar) on Mar 18, 2008 at 01:47 UTC
    Do you understand how to use a hash in Perl? They aren't exactly like databases (although the underlying idea is essentially the same). In any case there are no query/select statements in the SQL sense.

    I'm not a database guy, so I don't know the voodoo to set up your data model optimally for a database, but when I look at your problem, I see a hash mapping boat identifiers (names probably) to arrays of coordinate pairs in chronological order (HoAoA).

    But then I think that that kind of structure maps perfectly to netcdf, on which I cut my teeth. So I'm probably horribly biased on this.

    As for optimization, super search, profiling, and maybe low-level formats (for numeric data) are your friends.
      Do I understand how to use a hash? Probably not. I was thinking this could be as simple as key/value pairs, with the value being an array. Maybe not the best approach. (I may look into using the value part as a pointer to an array, if I can't store the array in the value directly. The array would be a .csv file's data. This might be memory intensive, maybe not).

      I think the simplest approach would be to continue with the DBI::CSV approach overall - the data is completely legible and editable; and maybe most important, something I can wrap my head around. The advantage here, as you pointed out, is that the data is fairly linear and not really related across tables; a flat file will do (at least for now).

      I'll shift the question then: given multiple .csv files, how can I read them into one db? Do I need to play tricks with the database handle? Because if I can master that trick, I can use simple date naming to id the .csv file and any historical query we run can load the .csv's that match the date range. (That's what I thought I could do with the hash). Deletion of historical data on a date basis is really simple also.

      I looked at the NetCDF but I think getting that to run on windows is going to be a bit of a PITA.

        Well according to TFM, you hand the handle a directory on init. It will map tables to files in that dir of the form ${table_name}.csv. So If you already have the csvs, simply drop em' in an empty directory and get crackin'.

        However, I wouldn't keep my tables by date since that would cause many table cross-references upon access, and hence file operations, to follow one boat (if you want to know where all the boats are at once, on the other hand, this would be a good choice).

        In stead I'd keep them by boat. That means one table/file per boat. It makes more sense to me. The compromise is that you loose append/delete performance for a gain in access performance (which do you do more of?).

        That said, I wouldn't handle the append/delete with DBI. Its a CSV file. Append the new records to the end of the files (very cheap) and drop old entries from the top when the time comes.

        I guess I have fundamental issues using DBI for this. You have a dataset that is only really useful in an ordered way, that comes in an ordered sequence (easy pickings), and you want to store it using a protocol that makes no guarantees about order? Something doesn't seem right.

        And yeah, netcdf only works for you if its already working for you. But you can get some pretty neat data using it.