in reply to More than one way to skin an architecture

First, let me say that that idea is supremely cool.

Questions about performance without code to benchmark are (almost if not entirely) meaningless.

My advice would be to implement your program in the easiest way possible while maintaining the ability to plug in alternatives later without much difficulty. For example, if you're thinking of storing a hash to disk using DBI, why not use a Tie module to do it automagically for you? It may be suboptimal (maybe not), but its really easy.

Once you've got it working right, then worry about getting it working efficiently.
  • Comment on Re: More than one way to skin an architecture

Replies are listed 'Best First'.
Re^2: More than one way to skin an architecture
by mcoblentz (Scribe) on Mar 18, 2008 at 00:02 UTC
    Thanks for the sentiment. The project has started to take on a life of it's own. My friends and co-workers are the encouragement here.

    You make a very good point about performance being something to optimize after you have at least a prototype - I guess that's the nature of iterative development after all - but I was trying to think my way through this one first. (Not my usual technique, I admit). Since I'm not too skilled yet at hashes and managing them, I'm a tad leery of what the query/select statements will look like and what_not. I know to bind columns to a csv table and query from that; not sure what the equivalent construct would be for a hash. Any thoughts there?

    I like the Tie idea. The module seems reasonable enough for a novice like me.

    As for optimization after the prototyping, well, there are the Monks, aren't there? ;)

      Do you understand how to use a hash in Perl? They aren't exactly like databases (although the underlying idea is essentially the same). In any case there are no query/select statements in the SQL sense.

      I'm not a database guy, so I don't know the voodoo to set up your data model optimally for a database, but when I look at your problem, I see a hash mapping boat identifiers (names probably) to arrays of coordinate pairs in chronological order (HoAoA).

      But then I think that that kind of structure maps perfectly to netcdf, on which I cut my teeth. So I'm probably horribly biased on this.

      As for optimization, super search, profiling, and maybe low-level formats (for numeric data) are your friends.
        Do I understand how to use a hash? Probably not. I was thinking this could be as simple as key/value pairs, with the value being an array. Maybe not the best approach. (I may look into using the value part as a pointer to an array, if I can't store the array in the value directly. The array would be a .csv file's data. This might be memory intensive, maybe not).

        I think the simplest approach would be to continue with the DBI::CSV approach overall - the data is completely legible and editable; and maybe most important, something I can wrap my head around. The advantage here, as you pointed out, is that the data is fairly linear and not really related across tables; a flat file will do (at least for now).

        I'll shift the question then: given multiple .csv files, how can I read them into one db? Do I need to play tricks with the database handle? Because if I can master that trick, I can use simple date naming to id the .csv file and any historical query we run can load the .csv's that match the date range. (That's what I thought I could do with the hash). Deletion of historical data on a date basis is really simple also.

        I looked at the NetCDF but I think getting that to run on windows is going to be a bit of a PITA.