in reply to How to reference to array keys?

I tried to run the OP's code but there were some fatal errors. I re-wrote the code and I hope my explanation of it will help the OP.

The original code takes the approach of reading all of the data into a single variable, then splitting it out again to an array, then there are all sorts of subroutines to get rid of this, get rid of that...

I don't understand what the end objective really is, but from reading the OP's code, this whole thing has to do with record_id's. So I decided upon a hash structure keyed to record_id's.

There is a fundamental difference in the parsing approach. Rather than read the whole thing in and then go searching around for stuff to delete, I read each line and decide what to keep. Deciding what to keep is different than deciding what to throw away.

The file format illustrates something to avoid if you are designing a log file format. What a log line means is state dependent upon what a previous line said. There are of course reasons to have complex records in a log file, but very often a de-normalized flat "every line speaks for its self" is the best.

Anyway, the data is read line by line. If it is important, something happens. Everything that is not important is ignored. The only two types of lines that matter are: FILENAME lines and data lines. The lines starting with FILENAME tells us how to interpret the 4th field of subsequent data lines. So if a FILENAME line is seen, this bit of status is saved. A list slice is used to just get the 4th thing on the line.

It appears that all of the data lines start with a '/' so I picked that to check against. If a data line is seen, then I get the 2nd column (the record number) and use that in conjunction with the state variable that tells us whether this is the owner or waiting. Then that line gets saved in the data structure.

So that's it! Two if statement do the whole job! There is no "oh, this is a special case at the beginning so we throw away the first 10 lines, or this funny line with ':' in column 1 is what ends the data. If the data format was not dependent upon the previous FILENAME line, then there would be only one if statement.

I did not parse each line to the nth degree. Often that's not necessary and here it would just over complicate what is already a complex data structure, HoHoA. There is a simple sub to split a line out into a hash. I return the hash as a flattened list for simplicity.

I guess it will become apparent what kind of reports are needed with an update from the OP. I show one report. The __DATA__ segment and Data::Dump output is long so that is in a readme section.

#!/usr/bin/perl use strict; use warnings; use Data::Dump qw(pp); #Data::Dumper is also great! my %recordState; #recordid=>OWNER|WATING=>line my $OwnerOrWaiting=""; while (<DATA>) { s/\s+$//; # delete trailing whitespace if (/^FILENAME/) { $OwnerOrWaiting = (split)[3]; } if (m|^/|) { my ($record_id) = (split)[1]; push @{$recordState{$record_id}{$OwnerOrWaiting}}, $_; } } sub get_fields { my $line = shift; my %fields; @fields{qw(FILENAME RECORD_ID M USER UNBR UNO TTY TIME DATE)} = split(' ', $line, 9); return %fields; } print "The record numbers are:\n"; print "\t$_\n" foreach (keys %recordState); # I don't know what kind of queries you want, but # for example print some basic data for any queue's that # have 2 or more people waiting... foreach my $recNum ( keys %recordState) { if ( @{$recordState{$recNum}{WAITING}} >= 2) { print "Hey, there are at least 2 guys in this crowd!\n"; foreach my $line ( @{$recordState{$recNum}{WAITING}} ) { my %temp = get_fields($line); #this next is a hash slice... print "@temp{'RECORD_ID','USER','TIME','DATE'}\n"; } } } print "Dumping data...\n"; foreach my $recHashref ( values %recordState) { print "OWNER $_\n", for @{$recHashref->{OWNER}}; print "WAITING $_\n", for @{$recHashref->{WAITING}}; } print pp(\%recordState); =OUTPUTS The record numbers are: 001!L!311895 001!10274882 00151120273 Hey, there are at least 2 guys in this crowd! 00151120273 jmorg 13:48:32 Jul 20 00151120273 gdavi 13:54:22 Jul 20 Dumping data... OWNER /prod-data/J 001!L!311895 X jmorg 2015244 134 s/109 13: +48:32 Jul 20 WAITING /prod-data/J 001!L!311895 X jmorg 5713932 191 ts/46 14: +01:42 Jul 20 OWNER /prod-datahi 001!10274882 X rfuse 3354796 61 ts/43 13: +39:02 Jul 20 WAITING /prod-datahi 001!10274882 X jmorg 3584038 247 ts/49 13: +39:22 Jul 20 OWNER /prod-data/J 00151120273 X jmorg 3584038 247 ts/49 13: +38:12 Jul 20 WAITING /prod-data/J 00151120273 X jmorg 2015244 134 s/109 13: +48:32 Jul 20 WAITING /prod-data/J 00151120273 X gdavi 1359996 62 ts/20 13: +54:22 Jul 20 =cut
=And rest of the story, pp() output and __DATA__ { "001!10274882" => { OWNER => [ "/prod-datahi 001!10274882 X rfuse + 3354796 61 ts/43 13:39:02 Jul 20", ], WAITING => [ "/prod-datahi 001!10274882 X jmorg + 3584038 247 ts/49 13:39:22 Jul 20", ], }, "001!L!311895" => { OWNER => [ "/prod-data/J 001!L!311895 X jmorg + 2015244 134 s/109 13:48:32 Jul 20", ], WAITING => [ "/prod-data/J 001!L!311895 X jmorg + 5713932 191 ts/46 14:01:42 Jul 20", ], }, "00151120273" => { OWNER => [ "/prod-data/J 00151120273 X jmorg + 3584038 247 ts/49 13:38:12 Jul 20", ], WAITING => [ "/prod-data/J 00151120273 X jmorg 2015244 134 s/109 13:48:32 Jul 20", "/prod-data/J 00151120273 X gdavi + 1359996 62 ts/20 13:54:22 Jul 20", ], }, } =cut __DATA__ UniData Release 7.2 Build: (3786) (c) Copyright Rocket Software, Inc. 1988-2009. All rights reserved. Current UniData home is /usr/udthome/. Current working directory is /usr/local/rfs/udt. :TERM ,0 :UDT.OPTIONS 20 ON :LOGTO /ud/JWP :LIST.QUEUE FILENAME RECORD_ID M OWNER UNBR UNO TTY TIME DA +TE /prod-data/J 00151120273 X jmorg 3584038 247 ts/49 13:38:12 Ju +l 20 ---------------------------------------------------------------------- +---- FILENAME RECORD_ID M WAITING UNBR UNO TTY TIME DA +TE /prod-data/J 00151120273 X jmorg 2015244 134 s/109 13:48:32 Ju +l 20 /prod-data/J 00151120273 X gdavi 1359996 62 ts/20 13:54:22 Ju +l 20 + FILENAME RECORD_ID M OWNER UNBR UNO TTY TIME DA +TE /prod-data/J 001!L!311895 X jmorg 2015244 134 s/109 13:48:32 Ju +l 20 ---------------------------------------------------------------------- +---- FILENAME RECORD_ID M WAITING UNBR UNO TTY TIME DA +TE /prod-data/J 001!L!311895 X jmorg 5713932 191 ts/46 14:01:42 Ju +l 20 + FILENAME RECORD_ID M OWNER UNBR UNO TTY TIME DA +TE /prod-datahi 001!10274882 X rfuse 3354796 61 ts/43 13:39:02 Ju +l 20 ---------------------------------------------------------------------- +---- FILENAME RECORD_ID M WAITING UNBR UNO TTY TIME DA +TE /prod-datahi 001!10274882 X jmorg 3584038 247 ts/49 13:39:22 Ju +l 20

Replies are listed 'Best First'.
Re^2: How to reference to array keys?
by mmartin (Monk) on Jul 27, 2011 at 14:33 UTC

    Hey Marshall sorry to double post, but you don't have to answer my previous question about that if statement, it does work it's just that everything below that if statement looks like it's quoted (text is pink, means quote in gedit). If I am thinking correctly it just checks for line that start with a "/". Right?
    I wasn't able to get it working because I didn't have the DATA::Dump package, but I do now and the ouput looks pretty good. Nice Work!

    Sorry for the newbieism but what does it mean when you have the lines with the array then curly braces with it like this line below?
    push @{$recordState{$record_id}{$OwnerOrWaiting}}, $_;
    Could you explain that to me?

    Thanks, Matt

        Good info! Thanks, I'm studying it now...


        Thanks, Matt

        Alright, so I have been reading the perl docs stuff online about references for most of the day.
        Most of it is for much easier stuff than Marshal had written in his example. On lines where Marshal uses references to a hash of hashes of arrays (I think???), could someone explain them to me I am not getting it?

        For Example:
        push @{$recordState{$record_id}{$OwnerOrWaiting}}, $_; if ( @{$recordState{$recNum}{WAITING}} >= 1) foreach my $line ( @{$recordState{$recNum}{WAITING}} )


        I know that, lets say:
        $myArray = \@array; is the same as @{$myArray}, which is referring to the array "@array"
        -OR-
        $myHash = \%hash is the same as %{$myHash}, which refers to the hash "%hash", I think????

        Also, that square brackets after a reference is referring to an element in that array. Like this, %{$myArray}[3] refers to the 4th element in the array it refers to.
        And for a hash reference, keys %myHash is the same as, keys %{$myHash}.


        Or what would be even more helpful, if you or anyone has some time, could you take Marshal's code and document it out to where a newbie could relate (i.e. comments before any of the more difficult lines)?

        If this is too much trouble, I understand, it's alot of work. I mean I could just take the code and probably use it as is, but I really want to learn this stuff.

        Extremely Thankful,
        Matt

      Hey Matt, extra post is no problem at all.
      If I am thinking correctly it just checks for line that start with a "/". Right? Yes, Right!
      I see you have a link to some more info on references, I'll hit a few highlights specific to this code.

      %recordState is a hash with basically 2 keys. In a complex data structure, everything is a reference until you get to the very last dimension where the data is. The value of $recordState{$record_id} is a reference to a another hash. The key to that hash is $OwnwerOrWaiting which is either set to "OWNER" or "WAITING" and the value is a reference to an array of input lines. Look at the output of Data::Dump and see the "key => value" pairs.

      Also take a look at my dump loop, foreach my $recHashref ( values %recordState). There I didn't even use the $record_id key at all! I just get the values of the first dimension hash, where are references to the 2nd dimension hash. In the line
      print "WAITING $_\n",  for @{$recHashref->{WAITING}}; $recHashref is dereferenced and the WAITING key is accessed. The values of that key is a reference to an array of lines. I say that that whole thing, $recHashref->{WAITING} is an array reference that I want to dereference by enclosing it in another set of curly braces and putting an @ in front. Each line gets printed out. Note that I used the exact same code pattern for OWNER as for WAITING. There is only one OWNER and that could have been:
      print "OWNER   $recHashref->{OWNER}[0]\n"; which would mean give me the 0th line in the array pointed to by $recHashref->{OWNER}. There is a certain value in uniformity and so I did not handle the case of just one OWNER separately.

      Maybe I confused you even more, but it this was remotely understandable, then the answer to you question above is that value of $recordState{$record_id}{$OwnerOrWaiting} is a reference to an array, which is dereferenced with the @ so that the current line can be pushed onto it.

      It could very well be that the data structure can be simplified once I read more about what you are needing as an end result.

        Hey Marshal,

        Thank you very much for taking the time to explain things to me. I will start going over all the info you gave me, and I will reply back in a little while after I gain some headway with this.


        Thanks, Matt


        Hey Marshal,

        I've defiantly made some headway with the new simplified version that you gave me. But I just have one questions about it at the moment.

        1. For the section in the code where it Dumps the raw records,
        foreach my $line (@records) { (my $colnumber1, @rest) = split(' ',$line); printf "%-8s @rest\n", $colnumber1; }
        how does the first line in the loop (one with the 'split' function) work?
        I get that your creating and assigning 2 new variables, but how does it work setting one split function equal to 2 different variables (and 2 different types of variables)? By doing 2 print statements inside that loop, one fore each variable, I can see that:
        $colnumber1 = either 'OWNER' -or- 'WAITING'
        @rest = one whole line of data not including 'OWNER' or 'WAITING'


        UPDATE
        Came across one other question for the moment.
        In the same file what does the line if (!$current_owner) check for if $current_owner has no value? Or does it?


        Thanks,
        Matt


Re^2: How to reference to array keys?
by mmartin (Monk) on Jul 27, 2011 at 14:12 UTC

    Hey Marshall,

    I copy/pasted in your code but from this line -->(line 20) if (m|^/|)... From that line on down it looks like you are missing a quotation mark or a backslash in the if.
    Should it be more like this? Or something like this?
    if (m|/^/|)

    Thanks, Matt
      No, the code is fine like it is. What is happening is that I changed the default separator for the regex from '/' to '|'. When that happens, there has to be an explicit "m" in front of it. "if (/^\//)" would be the same effect. Since the '/' means regex starts/ends, I have to escape the '/' at the beginning of the line and we wind up with a bunch of "leaning toothpicks". The aesthetics these leaning toothpicks didn't look very good to me so I changed it. Sometimes you will see if (m{^/}) because another way is to use curly braces.

      Sounds like you have a text editor that is not so smart and its leading you astray with wrong highlighting.