Urbs has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I don't thoroughly understand arrays and hashes and even The Camel is difficult to figure out this fairly simple task. If you can help, I would GREATLY appreciate it. I have a script that is chopping thousands of records into the sections that I need to evaluate weekly. Currently I am writing to a file a server name and data sizes to a csv. I have a few hundred unique servers with hundreds, if not thousands of lines for each server with varying data sizes. I would like to combine the data sizes for each unique server into one line per server. For example I am now producing: server1,4,2,2 server1,6,2,2 server1,4,1,1 server2,10,1,2 server2,1,1,1 I would like this to be: server1,14,5,5 server2,11,2,3 I am almost certain that I have to make a hash with the server name as the key but I'm thinking that there may be an easier way. Any help would be appreciated. Urbs

Replies are listed 'Best First'.
Re: Hash/Array help
by merlyn (Sage) on Jul 30, 2009 at 15:50 UTC
    The Camel is difficult to figure out
    That's why we wrote the Llama, which I would strongly recommend at your level of understanding.

    -- Randal L. Schwartz, Perl hacker

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Re: Hash/Array help
by ikegami (Patriarch) on Jul 30, 2009 at 15:33 UTC

    Please use <c>...</c> tags around computer text (code, data, etc). To mark paragraphs, just start them with <p>.

    I am almost certain that I have to make a hash with the server name as the key

    Indeed. Hashes are great for grouping.

    my %servers; while (<>) { chomp; my ($server_id, @fields) = split /,/; my $server = $servers{$server_id} ||= [0, 0, 0]; $server->[$_] += $fields[$_] for 0..$#fields; } for my $server_id (keys %servers) { my $server = $servers{$server_id}; print(join(',', $server_id, @$server), "\n"); }

    Update: Fixed incorrect var names.

Re: Hash/Array help
by kennethk (Abbot) on Jul 30, 2009 at 15:36 UTC
    Welcome to the Perl community. You might find it helpful to read How do I post a question effectively? to learn a bit more about formatting in the Monastery and what we like to see in a question.

    If you work with Perl for any significant amount of time, you'll find that the hash is frequently the easiest way to do things, and I think once you get comfortable using them your facility with the language will increase dramatically.

    In this case, I would suggest using a hash of lists in order to hold your data - a basic intro to nested structures in Perl can be found at perllol. You'll probably also want to read up on references at perlref and perlreftut. Since you are using CSV files, it's also probably a good idea to use Text::CSV to handle your files, to avoid unnecessary headaches. Obviously I have no idea if you are already or not, since you haven't posted any code - that's something we usually like to see.

    To give you a basic idea of how to use a hash of lists, I've put together the following code with your example:

    #!/usr/bin/perl use strict; use warnings; my %servers; while (<DATA>) { chomp; my @line = split /\,/; my $name = shift @line; if (exists $servers{$name}) { foreach my $index (0..$#line) { $servers{$name}[$index] += $line[$index]; } } else { $servers{$name} = [@line]; } } foreach my $key (keys %servers) { print join(',',$key,@{$servers{$key}}), "\n"; } __DATA__ server1,4,2,2 server1,6,2,2 server1,4,1,1 server2,10,1,2 server2,1,1,1
      I notice you're always escaping characters that don't need escaping in regex patterns. It's harder to read.
        When I first starting learning regexes, one of the books I was reading suggested doing that as future proofing - just because a punctuation character has no meaning in a regular expression now does not mean it won't in the future, and that would be a bug I'd hate to try and track down. I do agree that the result is my expressions suffer from leaning_toothpick_syndrome. I've cut back a bit, but something about that darned comma just begs to be escaped.