comment on

(Stepping back a step: you might see if you couldn't use the standard *NIX join command as this kind of thing is possibly right up its alley. But if you need to do more involved processing later, you probably want to stick with perl . . .)

If your "second" file is small enough you could probably just read over and parse it first and build a hash (%value_for_id) to map from your "id" to the corresponding value (presuming that's a 1-to-1 mapping; more on that later). Then you'd just read through the "main" file and append/insert the value column by retrieving it from that hash. If it's really really big then you could use a database backed hash (DBM_File, GDBM_File) and let it be stored to a file rather than having to keep everything in RAM.

If your mapping isn't 1-to-1 (so for id "AVP78042,AVP78031" there were (say) three values (0.29731, 0.8675309, 0.2112)) you need to figure out what that means. One possible solution would be to output three copies of the line (one with each value). In that case you'd do something similar but build a Hash Of Arrayrefs (similar to what you're doing in your sample) rather than just a flat hash. While you're processing the main file you'd iterate over the matching values with something like:

## presume @row has the current (main) row and we're adding value as l
+ast column
## replace with (e.g.) splice or whatever as appropriate
for my $value ( @{ $values_for_id{ $id } // [ "-" ] } ) {
  say join( qq{\t}, @row, $value );
}
[download]

Also you may want to look at Text::CSV_XS for parsing your files.

Edit: Derp I completely missed where you gave the numbers of lines; I thought you were talking much bigger files. the reading the first file into a hash approach should be more than fine and shouldn't stress the available memory on any modern box.

The cake is a lie.
The cake is a lie.
The cake is a lie.

In reply to Re: Is it possible to merge data frames with different amounts of rows? by Fletch
in thread Is it possible to merge data frames with different amounts of rows? by Mauri1313

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.