nurulnad:

Whenever you want to detect duplicates and/or unique values, one thing that should come to mind is the hash. Since it maps a key to a unique value. So in your case, you can just build a key for each incoming record. If the key does not exist in the hash, you process the record, ignoring it otherwise. Then be sure to enter the key into the hash.

Here's a quick[1] modification to your program to use a hash to detect and eliminate duplicate values:

#!/usr/bin/perl use strict; use warnings; # Records separated by blank line $/ = "\r\n\r\n"; # Records we've seen before my %records_seen; while (my $line = <DATA>) { # Get list of key fields for record my @key_fields = (split /\s+/, $line)[ -2, -8, -14, -5, -11, -17 ]; # Create composite key for record my $key = join("|",@key_fields); # Process the record if we haven't seen it before if (! exists $records_seen{$key}) { print $line; } # Remember that we've processed the record $records_seen{$key} = $line; } __DATA__ A 83 GLU A 90 GLU^? A 163 ARG A 83 ARG^? A 222 ARG A 5 ARG^? A 229 ALA A 115 ALA~? A 257 ALA A 118 ALA~? A 328 ASP A 95 ASP~? A 83 GLU A 90 GLU^? A 163 ARG A 83 ARG^? A 222 ARG A 5 ARG^? A 83 GLU B 90 GLU^? A 163 ARG B 83 ARG^? A 222 ARG B 5 ARG^?

Running this gives us:

Roboticus@Roboticus-PC /robo/Desktop $ perl 856427.pl A 83 GLU A 90 GLU^? A 163 ARG A 83 ARG^? A 222 ARG A 5 ARG^? A 229 ALA A 115 ALA~? A 257 ALA A 118 ALA~? A 328 ASP A 95 ASP~? Roboticus@Roboticus-PC /robo/Desktop $

Here are some assorted notes on your code, to explain why there are so many differences between your code and mine:

...roboticus

Update: Added this footnote:

[1] Apparently not *very* quick, as baxy77bax posted a similar reply some 30 minutes earlier as I was composing this node...

Update: Corrected "If the key exists in the hash" to "If the key does not exist in the hash" in the first paragraph.


In reply to Re: delete redundant data by roboticus
in thread delete redundant data by nurulnad

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.