Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks I have some data I need to parse and I am not to sure on the best data structure to use store this data. I have a file which is several thousand lines long in the following format; hostname, location, server fault type:

Server1:london:network_interface Server1:london:diskspace Server1:london:diskpace Server1:london:kernel Server2:paris:diskspace Server3:new_york:Kernel Server3:new_york:diskspace Server3:new_york:diskspace Server3:new_york:kernel

I am not interested in the middle column (location) and need to create a report that lists server name, type of fault, and a count of each type of fault per server. Report would output like this:

Server1 network_interface 1 Server1 diskspace 2 Server1 kernel 1 Server2 diskpace 1 Server3 kernel 2 Server3 diskspace 2

The problem I have is that I was thinking of reading in file and storing this in a hash so hostname is key and fault type is value but I cant do this as there are duplicate hostname entries in file so i cant use that as key. I have the same issue if i reverse this and use fault type as key as there would be duplicates of those so i cant use that as key either. I would really appreciate some help on this please on how i would go about parsing this data and outputting it in the report format i have shown above. Kind regards

Replies are listed 'Best First'.
Re: help with data structure to use and how to implement it
by 2teez (Vicar) on Sep 25, 2014 at 04:29 UTC

    Hi,
    Of course you can use hash like so:

    use warnings; use strict; use Data::Dumper; my %data; while(<DATA>){ my ($server_name,$fault) = (split/:|\s+/,$_)[0,2]; $data{$server_name}{$fault}++; } print Dumper \%data; __DATA__ Server1:london:network_interface Server1:london:diskspace Server1:london:diskpace Server1:london:kernel Server2:paris:diskspace Server3:new_york:Kernel Server3:new_york:diskspace Server3:new_york:diskspace Server3:new_york:kernel
    Output:
    $VAR1 = { 'Server3' => { 'kernel' => 1, 'diskspace' => 2, 'Kernel' => 1 }, 'Server1' => { 'kernel' => 1, 'diskpace' => 1, 'diskspace' => 1, 'network_interface' => 1 }, 'Server2' => { 'diskspace' => 1 } };
    The rest will then be just to print out! Over to you.

    Update: Please note that 'kernel' is not the same with 'Kernel', neither is diskpace same with diskspace

    If you tell me, I'll forget.
    If you show me, I'll remember.
    if you involve me, I'll understand.
    --- Author unknown to me
Re: help with data structure to use and how to implement it
by McA (Priest) on Sep 25, 2014 at 04:31 UTC

    Hi,

    IMHO you're on the right direction: Just add another hashref indirection and you're done:

    my %REPORT = ( 'Server1' => { 'network_interface' => 1, 'diskspace' => 2, }, );

    You should get the idea.

    Best regards
    McA

      Thanks a lot guys that's perfect. That's exactly how to do it. Been scratching my head for a while trying to work out the best data structure to hold this data in