arunhorne has asked for the wisdom of the Perl Monks concerning the following question:

Monks

I have a data file in the form:

key1,valueA key2,valueB key2,valueC key3,valueD

I want to translate it to:

key1,valueA key2,valueB,valueC key3,valueD

I'm sure this is easy in perl but I can't seem to do it easily. Also it seems like a prime candidate for awk, but again same problem... can anyone help?

Thanks

____________
Arun

Replies are listed 'Best First'.
•Re: Reformat Text File
by merlyn (Sage) on Oct 05, 2004 at 14:16 UTC
      This assumes that the keys are in sorted order to begin with and that it's easy to re-sort them. Instead, you might want to preserve the original order as much as possible.
      my %result; my @keys; while (<>) { chomp; my ($k, $v) = split /,/; push @keys,$k if (!exists $result{$k}); push @{$result{$k}}, $v; } for (@keys) { print join(",", $_, @{$result{$_}}), "\n"; }
        > This assumes that the keys are in sorted order

        What makes you say that?

Re: Reformat Text File
by tmoertel (Chaplain) on Oct 05, 2004 at 15:11 UTC
    Your example data file suggests that the input will always be sorted by key. If this is true, you have a more efficient option than the one suggested by most who answered your question.

    Rather than reading the entire input file into memory (e.g., as hash of lists) and then emitting your output, you can emit output in passing, as soon as each output line is determined. This approach has the advantage of requiring very little memory, which is important if your input files can be large.

    Here's one possible implementation, which uses autosplitting and other handy command-line flags (see perlrun):

    #!/usr/bin/perl -lanF, # if the current key (in $F[0]) is not the same as the # last key we saw, print out the merged output line for # the last key and then start a new merged output # line for the current key if ($last_key ne $F[0]) { print $merged if $merged; $merged = $_; $last_key = $F[0]; } # otherwise, the current key is the same as the last, # and so we can merge this line's value portion (in # $F[1]) with the previous else { $merged .= ",$F[1]"; } # when we reach the end of the file, we must print # the final merged output line END { print $merged if $merged }
    Because of the command-line switches we used, the body of the code will be run for each line of input, and the following variables will be set up for us automatically:
    $_ = the entire input line, with linefeed stripped
    $F[0] = the key portion of the line
    $F[1] = the value portion of the line
    Hope this helps.

    Cheers,
    Tom

Re: Reformat Text File
by dragonchild (Archbishop) on Oct 05, 2004 at 14:19 UTC
    open( my $fh, $infile ) or die "Cannot open '$infile' for reading: $!\ +n"; my %data; while ( defined( $_ = <$fh> ) ) { chomp; my @line = split( $_, ',', 2 ); push @{$data{$line[0]}}, $line[1]; } close( $fh ); open( $fh, ">$outfile" ) or die "Cannot open '$outfile' for writing: $ +!\n"; foreach my $k ( sort keys %data ) { print $fh join( ',', $k, @{$data{$k}} ), $/; } close( $fh );

    Now, if this is homework, I wouldn't turn that in - it'll be obvious you got help. If it's not homework, take the time to figure out what I did. The key is push @{$data{$line[0]}}, $line[1];.

    Being right, does not endow the right to be rude; politeness costs nothing.
    Being unknowing, is not the same as being stupid.
    Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
    Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

Re: Reformat Text File
by Happy-the-monk (Canon) on Oct 05, 2004 at 14:21 UTC

    You would use a hash of arrays, as shown in perllol:

    my %hash; while ( <FILE> ) { chomp; my ( $key, $value ) = split /,/ $_; # treats hashvalue as reference to an array: push @{ $hash{ $key } } , $value; } foreach my $key ( keys %hash ) { # same dereferencing as above. print "$key," , join( "," => @{ $hash{ $key } } ) , "\n"; }

    Cheers, Sören

Re: Reformat Text File
by ikegami (Patriarch) on Oct 05, 2004 at 14:30 UTC

    Here's a more memory efficient solution that only works if the keys are already sorted:

    { my $last; my @list; local $, = ','; local $\ = $/; while (<>) { chomp; my ($key, $val) = split($,, $_, 2); print $last, splice(@list) if ($.!=1 && $key ne $last); $last = $key; push(@list, $val); } print $last, @list if (@list); }

    I like the symetry of $/ and $, being used for both input and output.

Re: Reformat Text File
by Jasper (Chaplain) on Oct 05, 2004 at 14:34 UTC
    I'm slightly bored today, and I've made a few assumptions about the format of your text.
    1 while s/^(\w+),(.*)\n\1,(\w+)$/$1,$2,$3/m;