comment on

Hello Monks!

Here is the scenario of the file that I have to work with. Every week, I get copy of seven files one file for each day of the week. What I need to do is to combined the files into one file keeping them in order (the oldest file 'first', etc with the newest file last). I got this part working. The next piece that I have to do is to search for duplicate records, but with a twist.

For a lack of a better term the 'primary key' for the record is within the characters 9-13 in the row. So if any information is updated greater than the 13th character the record needs updated. But wait, it get's better. Say there was an update made on a Monday, then on a Friday to a record. When the file is combined, I need the newest inserted which would be the record inserted on Friday.

An example would be this:

This would be on line 10 so it would be something earlier in the week

542642 19779 SAMMYs 17TH ST

on line 1500 this would be listed
542642 19779 SAMMYs Sesame ST

So what I would like is the SAMMYs at 17th gone, and only have the listing at Sesame ST. Also it’s the 19779 would be what let's you know that it's the same store.

So here is where I’m at now. I searched through previous monk posts and found some really good stuff on finding and removing duplicate elements in an array.

http://www.perlmonks.org/?node_id=280484

Which got me to

http://perldoc.perl.org/perlfaq4.html#How-can-I-remove-duplicate-elements-from-a-list-or-array%3F

So here is what I did. I put it into an array, reversed the array (so that instead of oldest first, it was newest first) then did the search. Then I re-reversed the array putting the oldest first again.

    open (FILE, file.txt' || die "can't open file \n\n $!");
    @FileInfo=<$file_name>;
    @newFile=reverse(@FileInfo);
    my %seen = ();
    my @unique = grep { ! $seen{ $_ }++ } @newFile;
    @newFile=reverse(@unique);
    
    print FILE (@newFile);
    close (FILE);
[download]

My problem is that I can't find a good example that does a hash/grep based on a 'primary key'.

I think I'm close, but really need the help and assistance from more well versed monks than I to make this truly work right.

thanks!
Dave

In reply to how to remove similiar duplicate elements in a file/array by Dave_PA

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.