r.joseph has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to write a program that searches through a directory, and finds files with names that look like this:
xx_000000
where the 'xx' is a two-letter city abbreviation and the number is just a six digit ID number. What I then want to do is open the file, and store all the data in a array-of-a-hash-of-a-hash - I think. Honestly, I have a very difficult time with these types of structures. Here is what I am doing:
my %city_data = (); my ($city, $id) = split(/_/,$filename); open(THIS,"$filename") or die; my @tmp = <THIS>; close(THIS); push @{ $city_data{$city}{$id} }, @tmp;
Now, it seems like this should work correctly, but I can't tell if it does. First of all, is my approach correct? Secondly, if so, how would I print out everything that is in the @{$city_data{$city}{$id}} array? I know that this may be a simple question for most, and I have read everything I could find on PM and elsewhere about this subject, but I am still lost. Could someone please try to explain this to me, in detail if you wouldn't mind, so that hopefully the next time I have a problem like this I am not as lost as I am now? Thanks so much to all the gracious helpers out there (and to all who aren't so gracious, thanks anyway :-).

R.Joseph

Replies are listed 'Best First'.
(Ovid) Re: Confused about complex data structures.
by Ovid (Cardinal) on Jan 18, 2001 at 05:14 UTC
    What you are doing is sound. To test this, I created two files called cp_123456 and cp_123457. I then modified your snippet to use my files and I use Data::Dumper to view the resulting data structure:
    use strict; use warnings; use Data::Dumper; my %city_data = (); foreach my $filename ( 'cp_123456', 'cp_123457') { my ( $city, $id ) = split /_/, $filename; open THIS, "<$filename" or die "Can't open $filename for reading: +$!"; chomp ( my @tmp = <THIS> ); close THIS; push @{ $city_data{$city}{$id} }, @tmp; } print Dumper( \%city_data );
    The output was the following:
    $VAR1 = { 'cp' => { '123456' => [ 'this is some data', 'this is a test' ], '123457' => [ 'this is the second file', 'this is another line', 'this is the third line' ] } };
    Whenever I want to create a complex data structure, I have to decide what's the best way to get at the data. One important issue is to avoid iterating, if possible. As these structures get larger, iteration can kill your performance.

    After I have the basic idea of the structure laid out, I use Data::Dumper to output a subset of the structure so I can see that the results are what I expect. Using the debugger and entering the command x \%city_data has essentially the same effect.

    However, I find that complex data structures in Perl are, for me, similar to regexes in that at times I tend to use them innapropriately. Often, as the amount or complexity of the data increases, a complex data structure can become unmanageable. Using a database to handle that data can solve some serious headaches if you're concerned about scalability issues.

    Another issue to ask here is, what do you do if you wind up with two files with the same name? If someone accidentally copies a file to another folder that you also happen to read from, do you have duplicate data? You may wish to test for this.

    Also, is there any data validation? I realize that you just posted a snippet, but don't forget to test for this. Plus, if there is any chance that another process will be accessing the files while you are reading them, consider using flock.

    Cheers,
    Ovid

    Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

      I fully agree with what you wrote above. I just prefere to have references for my whole complex data. So instead of using a named hash like %data; i prefere using $data = {};. Maybe it's because I dislike using stuff like ${ $data{$foo} }. $data->{$foo} just looks better and I am aware of what I am doing. But that's pretty much personal preference..
      I can't thank you enough - this is exactly what I was looking for. Thanks everyone for your help and the quick response - oh, and don't be discouraged to continue posting, I am still fuzzy in this area and will always welcome more help!

      R.Joseph
      Favorite thing #101 about our little Monastery (wow, Monastery is a really odd looking word now that I've typed it a few times.. ah, you're still reading, moving along..) is finding new and neat uses of modules. I've seen Data::Dumper used before in stuff, but never to visualize !(Arrays)/(Hashes)/$1 of $1/$2 of $2/you get the idea!. Now that's cool!

      -marius
Re: Confused about complex data structures.
by saucepan (Scribe) on Jan 18, 2001 at 05:13 UTC
    Your code appears to do what you think it does. Whether this is best approach or not depends on what you are going to do with the structure once it's in memory. The layout you've selected is great if you'll often need to work with all the records for a particular city, for example.

    To see what your data structure looks like, you can use Data::Dumper to inspect the structure after you've built it:

    use Data::Dumper; print Dumper(\%city_data);
    or use the perl debugger (which I can't help you with, being of the printf() school of debugging, myself). :)

    As for printing everything in one of your arrays, you've already got this licked: print @{ $city_data{$city}{$id} } will output the content of the file "${city}_${id}".

(tye)Re: Confused about complex data structures.
by tye (Sage) on Jan 18, 2001 at 05:07 UTC

    Personally, I suggest you run your script under "perl -d" and step through your code and occasionally enter the command "x \%city_data" to see what you've done.

    I'd say the worst part of your current code is the open() because the die message should include $filename and $! and you should at least open "< $filename" so that special characters in $filename are less problematic.

    I don't see any other serious problems.

            - tye (but my friends call me "Tye")
Re: Confused about complex data structures.
by arturo (Vicar) on Jan 18, 2001 at 05:05 UTC

    Hm, what's in the files? How is the information structured NOW? Perhaps your data structure isn't the best one to use, but we won't know unless we know more about what kind of data the files are holding.

    Or so I gather from reading your description!

    You might want to check out RE: Data::Dumper (Adam: Sample Usage) for some info about how to store deeply nested data structures, or I like Storable and its freeze / thaw combination.

    I hope this helps!

    Philosophy can be made out of anything. Or less -- Jerry A. Fodor

      Actually, the data is simply stored line by line. It is data for apartment bulidings (I am writing a property managment system), so a typical data file might look something like this:
      555 Robertson Blvd., Beverly Hills A nice fixer-uper 200 units 1500 sq. ft. $1500/wk. 01/01/2001 Bob Joe (310) 333-4444
      so, as you can see, I just want to grab all the data into an array somehow. I will know that, say, that array's index 3 is actually the size (1500 sq. ft.) of the unit or whatever, I just need the data.

      I hope that I am making more sense...as I said, I am pretty confused at this point - Thanks!

      R.Joseph

        Here's what I might do in this situation to avoid ever-deeper nested data: store each file as a scalar (works best if you know the data won't get *too* large). You could use a straight hash, where the keys are the filenames and the values are the contents of the files, stored as scalars. You could, when printing the data, get out the info you wanted. Here's some code:

        #!/usr/bin/perl -w # ... # I assume @files holds the list of filenames my %data; foreach (@files) { # this will allow us to store the whole file as a string! { local $/ = undef; # open FILE, $_ or die "Couldn't open $_: $!\n"; $data{$_} = <FILE>; close FILE; } } # to print : foreach (sort keys %data) { my ($city, $id) = split /_/, $_; print "City: $city, ID : $id\n"; print "====\n\n"; print $data{$_}, "=====\n"; }

        This way of representing might not be maximally efficient for searching, but if the data set's not *too* huge it won't be a worry. If your data set were to get *huge*, then look into an RDBMS; MySQL is available under the GPL =)

        Philosophy can be made out of anything. Or less -- Jerry A. Fodor