manav_gupta has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks My apologies for a somewhat trivial question, but I'm unable to get my head around the vagaries of split. Here's my data sample:
Summary 51.58.214.48/dw109998bsw45 -> (1*Cisco_Power_Supply) (1*Cisco_CPU_Unit) (7*IETF_IF) (1*Cisco_Fan_Unit) (1*1213_Device) (2*Cisco_Memory_Pool) Summary 51.58.220.21/dw108432bsw25 -> (6*Cisco_Power_Supply) (1*Cisco_CPU_Unit) (333*IETF_IF) (6*Cisco_Fan_Unit) (1*1213_Device) (2*Cisco_Memory_Pool)
I need to produce a total for each of the items (such as Cisco_Power_Supply, etc). So I decided to use split to extract each item, and its count:
my @tmpFields = split (/\(?(\d+)\*(\w+)\)\s+/, $line);
But that gives me:
i = 0 , Summary 51.58.214.48/dw109998bsw45 -> i = 1 , 1 i = 2 , Cisco_Power_Supply i = 3 , i = 4 , 1 i = 5 , Cisco_CPU_Unit i = 6 , i = 7 , 7 i = 8 , IETF_IF i = 9 , i = 10 , 1 i = 11 , Cisco_Fan_Unit i = 12 , i = 13 , 1 i = 14 , 1213_Device i = 15 , i = 16 , 2 i = 17 , Cisco_Memory_Pool
I then get rid of array elements that are "empty" (contain nothing but a space, or begin with spaces) using:
my @tfields = grep {!/\s+/} @tmpFields;
At this point, the arrays end up with even number of elements:
print "array length: " , scalar(@tfields), "\n", Dumper (@tfields);
and the output:
array length: 12 $VAR1 = '1'; $VAR2 = 'Cisco_Power_Supply'; $VAR3 = '1'; $VAR4 = 'Cisco_CPU_Unit'; $VAR5 = '7'; $VAR6 = 'IETF_IF'; $VAR7 = '1'; $VAR8 = 'Cisco_Fan_Unit'; $VAR9 = '1'; $VAR10 = '1213_Device'; $VAR11 = '2'; $VAR12 = 'Cisco_Memory_Pool';
However, when I convert it to a hash, I lose some elements:
my %hFields = @tfields;
So how do I convert it to a hash and not lose any array elements? Of course, I'm sure this whole code snippet can be improved as well - so please could you help?
next if ($line !~ /^Summary/); my @tmpFields = split (/\(?(\d+)\*(\w+)\)/, $line); my @tfields = grep {!/\s+/} @tmpFields; my %hFields = @tfields;

Replies are listed 'Best First'.
Re: split snafus
by GrandFather (Saint) on Jun 17, 2008 at 11:58 UTC

    I'd use a regex to pull out the fields you are interested in then use split to chop em up. With a little cunning you can leverage map and a for (as a statement modifier) to get everything done in a fairly compact, but lucid way. Consider:

    use warnings; use strict; my $data = <<'DATA'; Summary 51.58.214.48/dw109998bsw45 -> (1*Cisco_Power_Supply) (1*Cisco_ +CPU_Unit) (7*IETF_IF) (1*Cisco_Fan_Unit) (1*1213_Device) (2*Cisco_Mem +ory_Pool) Summary 51.58.220.21/dw108432bsw25 -> (6*Cisco_Power_Supply +) (1*Cisco_CPU_Unit) (333*IETF_IF) (6*Cisco_Fan_Unit) (1*1213_Device) + (2*Cisco_Memory_Pool)DATA DATA my %totals; $totals{$_->[1]} += $_->[0] for map {[split '\*']} $data =~ /\(([^)]*) +\)/g; print "$_: $totals{$_}\n" for sort keys %totals;

    Prints:

    1213_Device: 2 Cisco_CPU_Unit: 2 Cisco_Fan_Unit: 7 Cisco_Memory_Pool: 4 Cisco_Power_Supply: 7 IETF_IF: 340

    Note the trick of putting the split inside a [] pair to create a two element array for each item the regex generates. The chunk to the left of the for builds a hash of totals keyed by item type.


    Perl is environmentally friendly - it saves trees
      geez. That'd take me a long time to get my head around it... Thanks! There's enough there for me to explore and learn :-)

        It should be more readable to do it like this:

        my %hFields; while ( $line =~ / \( (\d+) \* ([^)]+) \) /gx ) { $hFields{ $2 } += $1; }
Re: split snafus
by moritz (Cardinal) on Jun 17, 2008 at 11:34 UTC
    You have information loss because the first item of a pair is considered the key, and you have the key 1 multiple times.

    What do you want the result to be? A list of items with key 1? or the hash the other way round?

    Perhaps my %hFields = reverse @tfields; is what you want?

      Sorry, I want the total for each element. With the above code and the following snippet:
      while (($value,$key) = each %hFields) { if (defined($rHash->{$key})) { $rHash->{$key}->{'count'}+= $value; } else { $rHash->{$key}->{'count'} = $value; } $total+= $value; } } print "Total = $total\n", Dumper ($rHash);
      I get:
      Total = 146047 $VAR1 = { '1213_Device' => { 'count' => 1181 }, 'RTTMon_Common_Probe' => { 'count' => '2' }, 'Cisco_Temperature_Sensor' => { 'count' => 24 }, 'Cisco_Memory_Pool' => { 'count' => 2598 }, 'Cisco_Voltage_Sensor' => { 'count' => 496 }, 'Cisco_CBQoS_Class' => { 'count' => 6250 }, 'RTTMon_Jitter_Probe' => { 'count' => 175 }, 'Cisco_CPU_Unit' => { 'count' => 116 }, 'Cisco_Power_Supply' => { 'count' => '1' }, 'Cisco_CBQoS_Match' => { 'count' => 9068 }, 'Cisco_CBQoS_RED' => { 'count' => 3470 }, 'IETF_IF' => { 'count' => 119169 }, 'Cisco_Fan_Unit' => { 'count' => 3497 } };