Confused about complex data structures.

r.joseph has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
(Ovid) Re: Confused about complex data structures. by Ovid (Cardinal) on Jan 18, 2001 at 05:14 UTC
What you are doing is sound. To test this, I created two files called `cp_123456` and `cp_123457`. I then modified your snippet to use my files and I use Data::Dumper to view the resulting data structure: `use strict; use warnings; use Data::Dumper; my %city_data = (); foreach my $filename ( 'cp_123456', 'cp_123457') { my ( $city, $id ) = split /_/, $filename; open THIS, "<$filename" or die "Can't open $filename for reading: +$!"; chomp ( my @tmp = <THIS> ); close THIS; push @{ $city_data{$city}{$id} }, @tmp; } print Dumper( \%city_data );` [download] The output was the following: `$VAR1 = { 'cp' => { '123456' => [ 'this is some data', 'this is a test' ], '123457' => [ 'this is the second file', 'this is another line', 'this is the third line' ] } };` [download] Whenever I want to create a complex data structure, I have to decide what's the best way to get at the data. One important issue is to avoid iterating, if possible. As these structures get larger, iteration can kill your performance. After I have the basic idea of the structure laid out, I use Data::Dumper to output a subset of the structure so I can see that the results are what I expect. Using the debugger and entering the command `x \%city_data` has essentially the same effect. However, I find that complex data structures in Perl are, for me, similar to regexes in that at times I tend to use them innapropriately. Often, as the amount or complexity of the data increases, a complex data structure can become unmanageable. Using a database to handle that data can solve some serious headaches if you're concerned about scalability issues. Another issue to ask here is, what do you do if you wind up with two files with the same name? If someone accidentally copies a file to another folder that you also happen to read from, do you have duplicate data? You may wish to test for this. Also, is there any data validation? I realize that you just posted a snippet, but don't forget to test for this. Plus, if there is any chance that another process will be accessing the files while you are reading them, consider using flock. Cheers, Ovid Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.	[reply] [d/l] [select]
Personal preference by flocto (Pilgrim) on Jan 18, 2001 at 06:58 UTC
I fully agree with what you wrote above. I just prefere to have references for my whole complex data. So instead of using a named hash like `%data;` i prefere using `$data = {};`. Maybe it's because I dislike using stuff like `${ $data{$foo} }`. `$data->{$foo}` just looks better and I am aware of what I am doing. But that's pretty much personal preference..	[reply] [d/l] [select]
Re: (Ovid) Re: Confused about complex data structures. by r.joseph (Hermit) on Jan 18, 2001 at 05:24 UTC
I can't thank you enough - this is exactly what I was looking for. Thanks everyone for your help and the quick response - oh, and don't be discouraged to continue posting, I am still fuzzy in this area and will always welcome more help! R.Joseph	[reply]
Re: (Ovid) Re: Confused about complex data structures. by marius (Hermit) on Jan 18, 2001 at 21:22 UTC
Favorite thing #101 about our little Monastery (wow, Monastery is a really odd looking word now that I've typed it a few times.. ah, you're still reading, moving along..) is finding new and neat uses of modules. I've seen Data::Dumper used before in stuff, but never to visualize !(Arrays)/(Hashes)/$1 of $1/$2 of $2/you get the idea!. Now that's cool! -marius	[reply]
Re: Confused about complex data structures. by saucepan (Scribe) on Jan 18, 2001 at 05:13 UTC
Your code appears to do what you think it does. Whether this is best approach or not depends on what you are going to do with the structure once it's in memory. The layout you've selected is great if you'll often need to work with all the records for a particular city, for example. To see what your data structure looks like, you can use Data::Dumper to inspect the structure after you've built it: `use Data::Dumper; print Dumper(\%city_data);` [download] or use the perl debugger (which I can't help you with, being of the printf() school of debugging, myself). :) As for printing everything in one of your arrays, you've already got this licked: `print @{ $city_data{$city}{$id} }` will output the content of the file "${city}_${id}".	[reply] [d/l] [select]
(tye)Re: Confused about complex data structures. by tye (Sage) on Jan 18, 2001 at 05:07 UTC
Personally, I suggest you run your script under "perl -d" and step through your code and occasionally enter the command "x \%city_data" to see what you've done. I'd say the worst part of your current code is the open() because the die message should include $filename and $! and you should at least open `"< $filename"` so that special characters in $filename are less problematic. I don't see any other serious problems. - tye (but my friends call me "Tye")	[reply] [d/l]
Re: Confused about complex data structures. by arturo (Vicar) on Jan 18, 2001 at 05:05 UTC
Hm, what's in the files? How is the information structured NOW? Perhaps your data structure isn't the best one to use, but we won't know unless we know more about what kind of data the files are holding. Or so I gather from reading your description! You might want to check out RE: Data::Dumper (Adam: Sample Usage) for some info about how to store deeply nested data structures, or I like Storable and its `freeze / thaw` combination. I hope this helps! Philosophy can be made out of anything. Or less -- Jerry A. Fodor	[reply] [d/l]
Re: Re: Confused about complex data structures. by r.joseph (Hermit) on Jan 18, 2001 at 05:17 UTC
Actually, the data is simply stored line by line. It is data for apartment bulidings (I am writing a property managment system), so a typical data file might look something like this: `555 Robertson Blvd., Beverly Hills A nice fixer-uper 200 units 1500 sq. ft. $1500/wk. 01/01/2001 Bob Joe (310) 333-4444` [download] so, as you can see, I just want to grab all the data into an array somehow. I will know that, say, that array's index 3 is actually the size (1500 sq. ft.) of the unit or whatever, I just need the data. I hope that I am making more sense...as I said, I am pretty confused at this point - Thanks! R.Joseph	[reply] [d/l]
Re: Re: Re: Confused about complex data structures. by arturo (Vicar) on Jan 18, 2001 at 05:25 UTC
Here's what I might do in this situation to avoid ever-deeper nested data: store each file as a scalar (works best if you know the data won't get too large). You could use a straight hash, where the keys are the filenames and the values are the contents of the files, stored as scalars. You could, when printing the data, get out the info you wanted. Here's some code: `#!/usr/bin/perl -w # ... # I assume @files holds the list of filenames my %data; foreach (@files) { # this will allow us to store the whole file as a string! { local $/ = undef; # open FILE, $_ or die "Couldn't open $_: $!\n"; $data{$_} = <FILE>; close FILE; } } # to print : foreach (sort keys %data) { my ($city, $id) = split /_/, $_; print "City: $city, ID : $id\n"; print "====\n\n"; print $data{$_}, "=====\n"; }` [download] This way of representing might not be maximally efficient for searching, but if the data set's not too huge it won't be a worry. If your data set were to get huge, then look into an RDBMS; MySQL is available under the GPL =) Philosophy can be made out of anything. Or less -- Jerry A. Fodor	[reply] [d/l]