Phydro has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to parse a txt file into arrays based on the first "column", but can't seem to quite figure out how to search through the files, find the same pattern, and push into an array. Ex of file:
243_405 35 23 13
243_405 46 21 15
241_333 65 32 20
241_333 52 44 11
So, I would now want to push each line of 243_405 into @line1 , and 241_333 into @line2... Any words of wisdom for this seeker?

Replies are listed 'Best First'.
Re: pushing similar lines into arrays
by etcshadow (Priest) on Mar 18, 2004 at 04:57 UTC
    Use a hash to hold your arrays:
    while (<>) { my ($key,$rest) = split; $hash{$key} ||= []; push(@{$hash{$key}}, $_); }
    This'll give you a hash like:
    ( 243_405 => [ "243_405 35 23 13", "243_405 46 21 15" ], 241_333 => [ "241_333 65 32 20", "241_333 52 44 11" ] )
    ------------ :Wq Not an editor command: Wq
      Just a few observations that might clean up your implementation.

      • Don't forget to chomp.

      • No need to do the $hash{$key} ||= []; thing. Yes, it creates a key pointing to an anonymous array. But so does your push function in the next line. Even with use warnings; you don't need the ||= thing.

      • I don't see any advantage to pushing $_ into the anonymous array, while discarding $rest. I would think the way to go would be to use the three argument version of split so that you can limit split's output to two items; one containing the key, and one containing everything else. Your current code puts the key in $key, the first value in $rest, tosses out all the rest of the values, and then ultimately tosses out $rest too, only to push everything into the anon-array. Kinda wierd, IMHO.

      See later in this thread for what I believe is an implementation closer to the OP's needs.


      Dave

        Eh, whatever. Those points are really just style issues. There are almost certainly some tiny performance ramifications to the way the split is done... but hardly an issue in such an example (really... very tiny ramifications).

        As for chomping the line... he said he wanted the line. He didn't say anything about the newline, so I didn't assume anything about the newline.

        As for the unnecessary ||= []: yes, I know that's unnecessary, but I do it anyways, always. It's just personal style. In anything other than a one-liner (where economy of characters is important) I avoid the implicit autovivification of undef into anonymous references, just because (I believe) it's clearer to someone reading the code if you are explicit. They don't have to wonder if that autovivification was an accident or not.

        Finally, for the splitting... (again personal style), I find the 3-arg form of split to be ugly. Split and join are very pure and beautiful functional notions (string <=> list), and unless there's a compelling reason to mess with split's default behavior, I don't. Again, taking the unwanted stuff in and explicitly discarding it is also for readability and clarity of intent. If anything, looking at it now, I should have said my ($key, @rest) = split; to be clear that it was a list of arbitrary length and I was only interested in the first item.

        Anyway, no need to get into an argument over style. I just wanted to make clear that I didn't do that because I failed to understand... it was merely how I like to do it. ++ to you for your attention to detail, though.

        ------------ :Wq Not an editor command: Wq
Re: pushing similar lines into arrays
by davido (Cardinal) on Mar 18, 2004 at 06:15 UTC
    I know I'm a little late, but I like these little 'perl-doodles'. Here's how I'd do it.

    use strict; use warnings; my %hash; while ( <DATA> ) { chomp; my ( $key, $value ) = split /\s+/, $_, 2; push @{$hash{$key}}, $value; } print "$_: @{$hash{$_}}\n" foreach keys %hash; __DATA__ 243_405 35 23 13 243_405 46 21 15 241_333 65 32 20 241_333 52 44 11

    ...tested, of course.


    Dave

Re: pushing similar lines into arrays
by jweed (Chaplain) on Mar 18, 2004 at 05:02 UTC
    Well, this smells like homework, but I'd do it like this:
    use strict; my $head; my %hash; while (<>) { $head = (split)[0]; push @{$hash{$head}}, $_; }
    Update: Sigh. I'm late.



    Code is (almost) always untested.
    http://www.justicepoetic.net/
      Thanks jweed, I will actually take that as a compliment (smells like homework). Actually, doing a project for work, and am but a wee yung'un in the Perl world...
Re: pushing similar lines into arrays
by graff (Chancellor) on Mar 18, 2004 at 05:03 UTC
    You'll probably want to use a hash of arrays, keyed by the initial string -- something like this:
    my %lines; while (<>) { my ($key) = ( /^(\S+)/ ); # captures first token, assigns that to +$key push @{$lines{$key}}, $_; # (update: fixed missing "}") } for (sort keys %lines) { my $nlines = scalar @{$lines{$_}}; print "$nlines lines keyed by $_ :\n"; print " First line is: $lines{$_}[0]"; print " Last line is: $lines{$_}[$nlines-1]"; }