annie06 has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, I have a text file which shows the output of many commands. I'm trying to capture from this output, the number of volumes on a host along with the size of each volume, so I can then figure out capacity based on each host. here is a sample of the text file:
host_output volume_name host_name vol1 host1 vol2 host1 vol3 host1 vol4 host2 vol5 host2 vol2 host2 volume_output volume_name size vol1 10g vol2 20g vol3 30g vol4 30g vol5 20g
(of course these are just simple examples). This info (along with alot of other info) is in one file in the format similar to what I show above. How can I parse through this? I was trying to have a hash that lists all the volumes for a host, and then tries to look up the size of each volume but I'm not getting anywhere with it. Any ideas on what the best way to approach this would be? Thanks much!

Replies are listed 'Best First'.
Re: hashes with multiple keys
by CountZero (Bishop) on Feb 27, 2009 at 21:07 UTC
    It will not work with this data. For example in the volume_output table, vol2 refers to which host? There are two vol2 (one on host1 and one on host2). Are they different or are they the same vol2?

    If you want to solve your problem with a hash or --more likely-- a hash of hashes, your keys must be unique.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      that is the problem, the same volume can exist on multiple hosts. Any ideas of what direction to go since hash won't work?
        Do you mean "a volume with the same name can exist on different hosts"?

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: hashes with multiple keys
by kennethk (Abbot) on Feb 27, 2009 at 21:18 UTC

    Given that your provided host_output file shows a many-to-many mapping, the choice on that front would seem to be a hash of arrays (perllol), keyed on whichever value you'll want as your primary (I'm guessing host). The second half is easier, as your volume content can be stored in a simple hash. You could calculate the host size something like this:

    use strict; use warnings; my %volume = (host1 => ['vol1', 'vol2', 'vol3' ], host2 => ['vol4', 'vol5', 'vol2' ], ); my %size = (vol1 => 10, vol2 => 20, vol3 => 30, vol4 => 30, vol5 => 20, ); my $host = 'host1'; my $total_size = 0; foreach my $vol (@{$volume{$host}}) { $total_size += $size{$vol}; } print "$host has $total_size at it's disposal.\n"

    Update: Forgot a doc tag in my perllol link, so linked to the wrong documentation. Oops and fixed.

      thanks but how can I do that without hardcoding the volume name and size? Meaning my output is hundreds of lines that and I wnat to be able to run it against other files with similar output but different values.
        Based on your initial post, I assume you are already comfortable with file I/O. In order to generate your volume hash, you just need to start by putting an anonymous array in each hash entry and then populate it with a series of pushes. There are some reasonable examples in perllol.
        to be clear, I meant that I want my perl script to step through the output and build this list for me.
Re: hashes with multiple keys
by artist (Parson) on Feb 27, 2009 at 21:24 UTC
    I guess you might need following output:
    host1:
    vol1: 10g
    vol2: 20g
    vol3: 30g
    total:60h
    
    host2:
    vol4:30g
    vol5:20g
    vol2:20g
    total:70g
    
    
    You should have data structure:
    $VOLUMES = { vol1 => '10' , vol2 => '20', ... }; $HOSTS = { host1 => [qw(vol1 vol2 vol3), host2 => [qw(vol4 vol5 vol2)], ... };
    Iterate each host from $HOSTS (in sorted order) and iterate for each volume and get the values from other hash. and total them for capacity. </code>
    --Artist
Re: hashes with multiple keys
by ig (Vicar) on Feb 28, 2009 at 00:38 UTC
    How can I parse through this?

    There are many ways you could parse your file. Here is one way:

    #!/usr/bin/perl use strict; use warnings; my $section = "unknown"; my %volumes; # volumes for each host, keyed by host name my %capacities; # capacity of each volume, keyed by volume name while(my $line = <DATA>) { chomp($line); if ( $line =~ m/^host_output$/) { $section = "host_output"; my $discard = <DATA>; # discard the header line } elsif ( $line =~ m/^volume_output$/) { $section = "volume_output"; my $discard = <DATA>; # discard the header line } elsif ( $line =~ m/^$/) { $section = "unknown"; } elsif ($section eq "host_output") { my ($volume, $host) = split(/\s+/,$line); push(@{$volumes{$host}}, $volume); } elsif ( $section eq "volume_output" ) { my ($volume, $capacity) = split(/\s+/,$line); $capacities{$volume} = $capacity; } } foreach my $host (sort keys %volumes) { foreach my $volume (@{$volumes{$host}}) { print "$host: $volume: $capacities{$volume}\n"; } } __DATA__ host_output volume_name host_name vol1 host1 vol2 host1 vol3 host1 vol4 host2 vol5 host2 vol2 host2 volume_output volume_name size vol1 10g vol2 20g vol3 30g vol4 30g vol5 20g

    Which produces:

    host1: vol1: 10g host1: vol2: 20g host1: vol3: 30g host2: vol4: 30g host2: vol5: 20g host2: vol2: 20g

    This example makes several assumptions about your data, which you should consider carefully.

    Update: removed extraneous "next;"

Re: hashes with multiple keys
by codeacrobat (Chaplain) on Feb 28, 2009 at 00:52 UTC
    a nested hash approach
    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my $mode; my %vol; my %host; while(<DATA>){ next if /^\s*$/; next if /^volume_name/; if (/^host_output/){ $mode = "host"; next; } if (/^volume_output/){ $mode = "volume"; next; } my ($k,$v) = split /\s+/; if ($mode eq 'host'){ $vol{$k}{host} = $v; } elsif ($mode eq 'volume') { $vol{$k}{size} = $v; } } for my $k ( keys %vol ) { $host{ $vol{$k}{host} }{$k} = $vol{$k}{size}; } for my $k ( keys %host ) { no warnings; $host{$k}{total_size} += $_ for values %{$host{$k}}; } print Dumper (\%host); __DATA__ host_output volume_name host_name vol1 host1 vol2 host1 vol3 host1 vol4 host2 vol5 host2 vol2 host2 volume_output volume_name size vol1 10g vol2 20g vol3 30g vol4 30g vol5 20g __END__ $VAR1 = { 'host2' => { 'vol2' => '20g', 'vol4' => '30g', 'vol5' => '20g', 'total_size' => '70' }, 'host1' => { 'vol1' => '10g', 'total_size' => '40', 'vol3' => '30g' } };

    print+qq(\L@{[ref\&@]}@{['@'x7^'!#2/"!4']});
Re: hashes with multiple keys
by locked_user sundialsvc4 (Abbot) on Feb 28, 2009 at 03:42 UTC

    One strategy might be to first extract the data into a flat-file, then sort that file. Identical keys are now adjacent.