comment on

Your example data file suggests that the input will always be sorted by key. If this is true, you have a more efficient option than the one suggested by most who answered your question.

Rather than reading the entire input file into memory (e.g., as hash of lists) and then emitting your output, you can emit output in passing, as soon as each output line is determined. This approach has the advantage of requiring very little memory, which is important if your input files can be large.

Here's one possible implementation, which uses autosplitting and other handy command-line flags (see perlrun):

    #!/usr/bin/perl -lanF,

    # if the current key (in $F[0]) is not the same as the
    # last key we saw, print out the merged output line for
    # the last key and then start a new merged output
    # line for the current key

    if ($last_key ne $F[0]) {
        print $merged if $merged;
        $merged   = $_;
        $last_key = $F[0];
    }

    # otherwise, the current key is the same as the last,
    # and so we can merge this line's value portion (in
    # $F[1]) with the previous

    else {
        $merged .= ",$F[1]";
    }

    # when we reach the end of the file, we must print
    # the final merged output line

    END { print $merged if $merged }
[download]

Because of the command-line switches we used, the body of the code will be run for each line of input, and the following variables will be set up for us automatically:

$_ = the entire input line, with linefeed stripped

$F[0] = the key portion of the line

$F[1] = the value portion of the line

Hope this helps.

Cheers,
Tom

Tom Moertel : Blog / Talks / CPAN / LectroTest / PXSL / Coffee / Movie Rating Decoder

In reply to Re: Reformat Text File by tmoertel
in thread Reformat Text File by arunhorne

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.

`$_`	= the entire input line, with linefeed stripped
`$F[0]`	= the key portion of the line
`$F[1]`	= the value portion of the line