flemi_p has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I am new to Perl and have been writing a text based log file scanner. It has been fairly straightforward to produce something quickly until now where I have a requirement to implement an other than rudimentary structure from the log data . I am at a decision point as to whether to continue with Perl or not and to complete this using a compiled language.

The problem I have is that I am attempting to build up a structure at runtime using hashes. The log itself is implemented in my Perl script as a hash whose keys are from the root column below. I want each root key value to be a hash of parent keys associated with the root key. Current keys have a similar hierarchical relationship with parent values and have a hash with duration and function name keys associated with them. I need to get at the duration and function name for any root, parent, current key. I need to navigate down the tree from the root to get at these values and want to read the file line by line at runtime and build the structure simultaneously.

I could write this in C++ or C or any compiled language where there is explicit memory allocation, variable initialisation and I could use pointers/references to make it perform etc. I would prefer to complete it in Perl, if possible. The data below is not the real data but is fairly representative of the type of thing I can expect. The keys especially are much longer strings and can contain both alphas and numerics.
@lines = ( # ?, root , parent, current, duration, function name # 0, 1 , 2 , 3 , 4 , 5 ["1", "2", "3", "4", "100" , "A" ] , ["10", "20", "30", "40", "200" , "B" ] , ["11", "21", "31", "41", "300" , "C" ] , ["12", "22", "32", "42", "400" , "D" ] , ["13", "23", "33", "43", "500" , "E" ] , ["13", "23", "33", "53", "600" , "F" ] , ["13", "23", "33", "63", "700" , "G" ] , ["13", "23", "34", "73", "800" , "H" ] , ["13", "23", "34", "83", "900" , "I" ] , ["13", "24", "35", "93", "1000" , "J" ] , ["13", "24", "36", "103", "1100" , "K" ] ) ;
I am using the ActiveState version of Perl on win2K but the program will eventually run on Solaris. A previous version I have developed in this way is currently running.

Am I using the right tool for this in Perl? I have some sample test code and output, would it be of any use if I posted this too?
Can someone offer some help or direction to some help.

Thanks
Pat

Edit by castaway, added code tags

Replies are listed 'Best First'.
Re: Hashes
by Zaxo (Archbishop) on Jun 22, 2004 at 11:04 UTC

    I think something like this is what you want,

    use Data::Dumper; my %kid; while (<DATA>) { chomp; my @line = split /,/; $kid{$line[1]}{$line[2]}{$line[3]} = { duration => $line[4], function_name => $line[5] }; } print Dumper(\%kid); __DATA__ 1,2,3,4,100,A 10,20,30,40,200,B 11,21,31,41,300,C 12,22,32,42,400,D 13,23,33,43,500,E 13,23,33,53,600,F 13,23,33,63,700,G 13,23,34,73,800,H 13,23,34,83,900,I 13,24,35,93,1000,J 13,24,36,103,1100,K
    That prints the data structure as: Is that what you were after?

    Deep hashes like this are not the easiest thing to work with. Care must be taken to check existence of a key at each level while looking things up. From the feel of this sample problem, it's possible that Graph will make a more comfortable representation of the data.

    After Compline,
    Zaxo

      Hi Zaxo,

      Many thanks for your time and effort deciphering my post and posting, what on the face of it, looks like an excellent reply. This looks like exactly what I want, I must give it a try. I kinda thought this problem would either be very hard/impossible to solve (very unlikely!)or have a concise elegant solution like you've posted.

      I intend to have a play and may come back with some more questions, if thats ok?

      Thanks
      Pat
Re: Hashes
by tachyon (Chancellor) on Jun 22, 2004 at 11:08 UTC
    Perl is ideal. It cares not about numeric, string, memory allocation. Provided you can accept that it will burn perhaps as much as 4-10x more memory than optimised C it rocks. It is also very fast on this sort of task (often beating FAQ - Fair Average Quality - C).

    Your description is a little distracted and does not really make explicit how you want to access the data. Making a few assumptions, perhaps this is close to what you want. I am using a hash, of array refs, which contain one or more hash refs to cope with you duplicate '?' thinging. ie in the struct there is a key '13' that contains an array ref. This array ref contains a series of hashes that contain the data pertaining to the '13' == ? thingy(s). Call it a link list. Literally < 10 minutes in Perl, hours in C/C++. It will be fast but it will burn memory. If you have the memory no worries. If not 1GB =~ 3 hours coding time in business case terms.

Re: Hashes
by husker (Chaplain) on Jun 22, 2004 at 15:05 UTC
    I've written a fairly hairy FlexLM log file parser myself, and hash-of-hashes is definitely the tool to crack that nut. Perl is definitely up to the task!

    This program was my first real foray into Perl also and the script (and my Perl knowledge) has been slowly evolving over the last 3 years.

Re: Hashes
by graff (Chancellor) on Jun 23, 2004 at 05:35 UTC
    I need to get at the duration and function name for any root, parent, current key.

    If I understand you correctly, you are starting with a 2-D array (a flat table) such that in each row of 6 columns (0-5): column 0 is meaningless, columns 1-3 combine to make up a primary key, and columns 4 and 5 are the data of interest for each primary key.

    If all rows are unique in terms of the tuples formed by columns 1-3, then you really only need a simple hash, using the concatenation of these three fields as the hash key -- e.g., building on Zaxo's example:

    my %table; while (<DATA>) { chomp; my @line = split /,/; my $key = join ",", @line[1..3]; if ( exists( $table{$key} )) { warn "oops -- duplicate key $key... is there a problem with th +at?\n"; next; # just in case there's a problem with that } $table{$key} = join ",", @line[4,5]; } # later on, if you really need the individual columns from the # hash key or the hash value, just use "split /,/" as needed # <update> # note that you can easily search for groups of hash keys: my @root23keys = sort grep /^23,/, keys %table; my @r23p34keys = sort grep /^23,34,/, keys %table; # and so on... </update> __DATA__ 1,2,3,4,100,A 10,20,30,40,200,B 11,21,31,41,300,C 12,22,32,42,400,D 13,23,33,43,500,E 13,23,33,53,600,F 13,23,33,63,700,G 13,23,34,73,800,H 13,23,34,83,900,I 13,24,35,93,1000,J 13,24,36,103,1100,K
      Hi graff,

      Thanks for your time. Looked at and played with the code and it concisely and simply does what I want.

      Thanks
      Pat