So what you are basically doing is:

1. Checking if column 2 has been seen already - if so, next line
2. Else, do some processing on the row and record the value of column 2 as seen

This is a pretty common thing to do and can be super fast. As already pointed out, the easiest win is to use a hash for maintaining the record of what col 2 values have been seen. That will get your check nearer O(1) than O(N).

20k isn't a lot - this is all you should have to do. If you find yourself dealing with a LOT of records (say half a million) you can get really cheap use of multiple cpu/cores (assuming you have them) by writing two scripts - 1 to strip out all the lines with duplicated col 2 values, the second can thus skip that step. Then pipe the output of one script to the input of the other and Unix will run the two processes in parallel for you. Assuming you are using a Unix OS that is. Something like:

cat the_file.txt | remove_duplicates.pl | process_data.pl

In reply to Re: Needed Performance improvement in reading and fetching from a file by aufflick
in thread Needed Performance improvement in reading and fetching from a file by harishnuti

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.