More than one way to skin an architecture

mcoblentz has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: More than one way to skin an architecture by plobsing (Friar) on Mar 17, 2008 at 22:08 UTC
First, let me say that that idea is supremely cool. Questions about performance without code to benchmark are (almost if not entirely) meaningless. My advice would be to implement your program in the easiest way possible while maintaining the ability to plug in alternatives later without much difficulty. For example, if you're thinking of storing a hash to disk using DBI, why not use a Tie module to do it automagically for you? It may be suboptimal (maybe not), but its really easy. Once you've got it working right, then worry about getting it working efficiently.	[reply]
Re^2: More than one way to skin an architecture by mcoblentz (Scribe) on Mar 18, 2008 at 00:02 UTC
Thanks for the sentiment. The project has started to take on a life of it's own. My friends and co-workers are the encouragement here. You make a very good point about performance being something to optimize after you have at least a prototype - I guess that's the nature of iterative development after all - but I was trying to think my way through this one first. (Not my usual technique, I admit). Since I'm not too skilled yet at hashes and managing them, I'm a tad leery of what the query/select statements will look like and what_not. I know to bind columns to a csv table and query from that; not sure what the equivalent construct would be for a hash. Any thoughts there? I like the Tie idea. The module seems reasonable enough for a novice like me. As for optimization after the prototyping, well, there are the Monks, aren't there? ;)	[reply]
Re^3: More than one way to skin an architecture by plobsing (Friar) on Mar 18, 2008 at 01:47 UTC
Do you understand how to use a hash in Perl? They aren't exactly like databases (although the underlying idea is essentially the same). In any case there are no query/select statements in the SQL sense. I'm not a database guy, so I don't know the voodoo to set up your data model optimally for a database, but when I look at your problem, I see a hash mapping boat identifiers (names probably) to arrays of coordinate pairs in chronological order (HoAoA). But then I think that that kind of structure maps perfectly to netcdf, on which I cut my teeth. So I'm probably horribly biased on this. As for optimization, super search, profiling, and maybe low-level formats (for numeric data) are your friends.	[reply]
Re^4: More than one way to skin an architecture by mcoblentz (Scribe) on Mar 18, 2008 at 04:17 UTC
Re^5: More than one way to skin an architecture by plobsing (Friar) on Mar 18, 2008 at 05:32 UTC
Re: More than one way to skin an architecture by BrowserUk (Patriarch) on Mar 18, 2008 at 04:47 UTC
For keeping arbitrary datastructures on disk, DBM::Deep takes some beating. If you know how to use perl's hashes and arrays, then use the tie interface. Adding a single line to the top of your program and whatever you do to the hash ends up on disk ready to be retrieved next time. No need to re-parse your data every time. No obscure interfaces to learn. No need to force fit your data onto the relational model. And it's surprisingly fast. One line, do whatever Perl lets you do with your data, and it just works. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply]
Re: More than one way to skin an architecture by okram (Monk) on Mar 18, 2008 at 07:34 UTC
I assume that you have the following: Current day's data as CSV Historical data as CSV or other input In order to display the data, I assume that you're going to do something like this, Foreach Ship: Display current day's position as RED dot or similar Foreach historical data, display that position as BLUE dot or similar -- if you have it IMHO the most simple thing to do would be: Prepare an HASH from the historical data -- if you have it -- like this: Keys are ships names Values are an ARRAY REF of the historical position, each unit like: {datetime => '...', position => '...'} Once you've plotted today's data, all you need to do is go through all of today's data (foreach ship name) and: Push today's data to the historical hash of arrayrefs -- if you have no historical data, simply create the hash the first time save for later reference Then, whenever you start the program up, you'll only have to: Load historical data see above for displaying see above for merging see above for saving Now, for the hash part. The hash I have in mind would be like: `# today's data -- your starting point my %SHIPS = ( SHIP1 => [ {datetime => '20080318-0730', latitude => '10.33', longitude => '- +05.45'}, ], # other ships );` [download] You can then save that via Data::Dumper: `use Data::Dumper; print Data::Dumper::Dumper([\%SHIPS],[*SHIPS]); # save to file, don't +print` [download] This specific invocation of Data::Dumper::Dumper will give you exactly what you have in the hash. The cool thing is that at the subsequent start of the program all you have to do is to "do" the file, and Perl will get the contents of the %SHIPS hash you had at the previous run: `my %SHIPS; do 'dumped_file';` [download] As above, all you'd then have to do is display the data, merge it with the historical, and save again. The "merge" (assuming you are reading through a CSV on <>) is something like: `while (<$CSV>) { chomp; # assume csv is SHIP1,DATETIME,LATITUDE,LONGITUDE. # modify accordingly.. my ($shipname, $datetime, $latitude, $longitude) = split(/\,/$_); my %today_details = (datetime => $datetime, latitude => $latitude, l +ongitude => $longitude); # this is what will get pushed on the arrayr +ef my @arr = @{$SHIPS{$shipname} \|\| []}; # get curr contents push (@arr, \%today_details); # push today's data $SHIPS{$shipname} = \@arr; # push back } # that's you done with the merge` [download] Really, even if it does get VERY long... it's not like your PC can't handle a 10MB file after some time.. Or maybe you'd want to implement a "dump older than one month details".. you'll simply go through the hash, then through the arrayref, and discard all the entries which are older than a specified time.. Exercise left to the reader ;) Hope this has helped :)	[reply] [d/l] [select]
Re^2: More than one way to skin an architecture by mcoblentz (Scribe) on Mar 19, 2008 at 04:44 UTC
I think that this is the general direction I will try first. One of the things that occurred to me during this conversation is that the most likely historical query would be to pick a ship and plot its track over time. I don't think I would ever query by a geographical location (lat/long box or circle search) ever. That doesn't make any sense to me (at least right now. If I have to do it later, I could always just dump the file into a DB and query it that way. I like the idea of being able to trim the historical data by ship/date, because I don't have to trim all ships to the same period.	[reply]
Re: More than one way to skin an architecture by roboticus (Chancellor) on Mar 17, 2008 at 22:57 UTC
mcoblentz: That sounds like a really cool project. Just one thought: If you're wanting to have persistence, and you don't mind using DBI, why not just use a database (even a small one like SQLite)? ...roboticus	[reply]
Re^2: More than one way to skin an architecture by mcoblentz (Scribe) on Mar 17, 2008 at 23:55 UTC
Hi roboticus, Fair question. My initial, instinctive response was because 1) I don't have anything installed on the laptop and didn't really want learn another program (I'm okay at SQL but not great by any means). And setting up primary and foreign keys, etc., really is out of my league at the moment. 2) I was thinking that I would move all of this to my website eventually and run it there, so text processing and perl would be better; but hey, I have MySQL on the site already, so... ... it's not a bad idea and I really should think about it. Queries would be just as simple; porting to a web host would be fairly easy; and dropping old data wouldn't be that hard. Food for thought.	[reply]
Re: More than one way to skin an architecture by jethro (Monsignor) on Mar 18, 2008 at 04:27 UTC
If you are looking for simple hack-friendly solutions, you can serialize (i.e. store) arbitrary data structures to disk and read them again the next time with something like YAML::LibYAML. As long as your program needs to access the whole database anytime it wants to draw the earth you might as well slurp the entire hash in one go from file to memory (or use one file per day if you want a nice way to segment the data or invalidate or overwrite old data) instead of using a database. The best thing, the data on disk is human-readable, so there is always the possibility to use emacs grep and less to edit, view or search your data. Performance should be a bit slower than Tie, but you don't have to think about how to store more complicated data structures like hashes of arrays. YAML does that for you. Also this solution is a bit low on scalability, but as long as your data doesn't go into tens of megabytes you are on the safe side. And you can always upgrade to a database, if you need to scale up or need a more 'random access' on the data.	[reply]
Re: More than one way to skin an architecture by chrism01 (Friar) on Mar 18, 2008 at 06:36 UTC
Given that you are talking about multiple data types eg boats, "earthquakes (NRT from the USGS); volcanoes; satellites; clouds and storms; and even some airplanes", and that you say you've got MySQL already, I'd go with that. It'll payoff and it's not difficult to do a simple database. Don't use foreign keys for now, just use some indexes. Cheers Chris	[reply]
Re: More than one way to skin an architecture by locked_user sundialsvc4 (Abbot) on Mar 18, 2008 at 23:08 UTC
One thing to keep in mind is that “memory” is, at least potentially, “a disk file.” The actual degree to which it is so depends entirely upon the amount of RAM and the other workload that the computer is doing, but as a matter of general principles if you're dealing with a large amount of information you need to be mindful of just how large it is. “Large in-memory hashes” can be problematic because of the nature of the hashing-algorithm. It can cause widely-scattered memory references and this can make your “working set” large, in other words, excessive paging. On the other hand, if you know that the target machine has gobs of chips and not much else to do, it might be a non-issue. (“Just throw silicon at it.” “If you've got it, flaunt it.”) As previously mentioned here, Perl offers the `tie` mechanism which actually allows you to specify how a “hash” is actually stored: for instance, you can tie it to a Berkeley-DB file. So the syntax is that of a hash, but the implementation is disk file-access. The key issue here is to be aware of what your chosen implementation is going to do, and how it's going to behave on your hardware.
Re^2: More than one way to skin an architecture by BrowserUk (Patriarch) on Mar 19, 2008 at 00:11 UTC
Swap files are not persistant, so they don't really help the OP in persisting his "A day's data is about 10KB so this isn't going to get very large, even after a month; which is about all I want to save." Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply]
Re^3: More than one way to skin an architecture by mcoblentz (Scribe) on Mar 19, 2008 at 04:39 UTC
Yeah. I generally agree with that statement. I also like Jethro's comment about maintaining the data in a human readable form (YAML comment above). I'm now thinking that persistence in a human readable form is the real need. The value of all this, at least to me, is that I get to sound out the questions against a body of people who can give me some answers and act as a sounding board. My wife just can't answer these questions. For that, I thank you all.	[reply]
Re^4: More than one way to skin an architecture by BrowserUk (Patriarch) on Mar 19, 2008 at 05:31 UTC
Re^2: More than one way to skin an architecture by mcoblentz (Scribe) on Mar 19, 2008 at 05:01 UTC
Frankly, that question is the very reason I asked this question. Even though the dataset might not get very large, and therefore I could probably afford a "sloppy" approach to this problem, it seems prudent to really think this one through. Who knows? I might decide to keep a year's worth and then where would I be? I running this on my "work" laptop. It's got all the usual Office apps running - Outlook, Word, PPT, etc., plus Firefox (which seems to be a memory hog, if you ask me). This planet program is grabbing the USGS every 20 minutes, airplane data every 5 (if I turn that on) and updating the day/night terminator line every 5. Since this script is going to run once a day (well, ships actually update about 4x/day but I don't know if I care that much), I don't want to drag this thing down just because I was sloppy about hunting down cruise ships. The resources for this problem are finite; I don't have all the silicon I would like (I wish I did!); I work in an enterprise software company and bad architecture just offends me. I didn't know that large hashes can cause a memory problem. It would be great to hear more about that kind of thing. And your comment, "The key issue here is to be aware of what your chosen implementation is going to do, and how it's going to behave on your hardware" is spot-on; except that I don't know what a given implementation might do - I'm new to Perl and therefore the question.	[reply]
Re^3: More than one way to skin an architecture by bigearsbilly (Novice) on Mar 19, 2008 at 11:36 UTC
you could always use a DBM file (AnyDBM) standard modules. they behave just like a hash, are dead easy to use and fast. They are simple key->value databases stored on disk. I used one with 250,000 price references once and was dead fast. hash version was very slow to load. updating with new CSVs would be easy too.	[reply]
Re: More than one way to skin an architecture by Anonymous Monk on Mar 20, 2008 at 21:57 UTC
This seems to me to be more a question of program architecture than of implementation. Being about 2 years of full time perl programming in, I try to think about my data structures first before I design the code. Given that you are taking input from planes, ships, earthquakes, weather, etc. the DATA MODEL that will be the core of your design - and persistent data store should be a single table of all of the common factors - lat, long, time - and some "type" that uniquely specifies the special rendering of that item. KISS! Two things come up here - this is a single table - probably 2-D. And you can implement the VIEW as a set of queries on this table (row by row?) - with specific routines to render the different types. I am struggling to think of a better conceptual way to do this than: SELECT name, type FROM global_data WHERE lat = 45% # ie iterate row by row AND long = 15% AND time < 28 days SORT BY time DESCENDING into two arrays @event_name = []; @event_type = []; ... err and just display item '0' from the array (you can ony render the top / most recent). Icons, colours etc can be chosen to reflect you desired view. Then you can build filters that get / scrape data to load the table and that display data according to its type. So - even if you did not have any SQL experience, I think that this would be the first port of call to help Structure your Query. And then I would bear in mind that this needs (i) One well designed table (ii) No transactions, not performance limiting, no indexes And, of course, great DBI implementations and MySQL for PCs, Linux, etc, etc.	[reply]