Problem appears to come in two parts:

you say this is critical to the performance, so I'm assuming that this is going to filter out a lot of records, which don't need further processing.

Extracting the "UTR" is an SMOC, and the only question is what Perl will do quickest. I found (see below) that good old fashioned index/substr did the trick -- that small part of the puzzle runs ~9 times faster. (If the first field is fixed length, you could do better still.) Note that for records you do want to process you'll need to do the split as well -- so the actual saving depends on what proportion of records are being filtered out.

Testing whether the "UTR" is one of the "sent UTR"s requires some sort of search/match. The grep in the code is running linear search along @sentUTRs, what's more, it processes every entry even if there's been a match already. I suggest a hash would be a better choice.

CAVEAT: I have just realised that the test grep($r =~ /$_/, @sentUTRs) is, of course, not grep($r eq $_, @sentUTRs) -- if partial matches are essential, then a hash won't cut it :-(

Code below. Using index/substr and a hash ran ~16 times faster on my artificial test. Benchmark output (edited for clarity):

Benchmark: timing 400000 iterations
     Split  : 33.39 usr +  0.01 sys = 33.40 CPU @  11976.05/s
     Regex_b:  5.56 usr +  0.00 sys =  5.56 CPU @  71942.45/s
     Regex_a:  5.33 usr +  0.01 sys =  5.34 CPU @  74906.37/s
     Index  :  3.87 usr +  0.00 sys =  3.87 CPU @ 103359.17/s
Benchmark: timing 200000 iterations
  Split Grep: 55.82 usr +  0.02 sys = 55.84 CPU @   3581.66/s
  Index Grep: 39.86 usr +  0.01 sys = 39.87 CPU @   5016.30/s
  Split Hash: 18.12 usr +  0.01 sys = 18.13 CPU @  11031.44/s
  Index Hash:  3.30 usr +  0.00 sys =  3.30 CPU @  60606.06/s
YMMV.

As you'd expect, fiddling with the coding to optimize the extraction of the "UTR" makes only a modest difference. Changing the algorithm for searching the "sentUTRs" makes a rather bigger difference.

Update: added the essential exists to the hash lookups, and updated the benchmark timings.


#!/usr/bin/perl use strict; use warnings; use Benchmark () ; # Gather in the data my @input = <DATA> ; # Extracting the 'UTR' print "Testing the 'UTR' extraction\n" ; for (@input) { my $r_s = by_split() ; my $r_a = by_regex_a() ; my $r_b = by_regex_b() ; my $r_i = by_index() ; my $s = "" ; if ($r_a ne $r_s) { $s .= " BUT \$r_a='$r_a'" ; } ; if ($r_b ne $r_s) { $s .= " BUT \$r_b='$r_b'" ; } ; if ($r_i ne $r_s) { $s .= " BUT \$r_i='$r_i'" ; } ; print " $r_s", ($s ? $s : " OK"), "\n" ; } ; Benchmark::timethese(400000, { 'Split ' => sub { by_split() for (@input) ; }, 'Regex_a' => sub { by_regex_a() for (@input) ; }, 'Regex_b' => sub { by_regex_b() for (@input) ; }, 'Index ' => sub { by_index() for (@input) ; }, }); sub by_split { my @data = split(/~/, $_) ; return $data[1] ; } ; sub by_regex_a { m/~(.*?)~/ ; return $1 ; } ; sub by_regex_b { m/~([^~]*)~/ ; return $1 ; } ; sub by_index { my $i = index($_, '~') + 1 ; return substr($_, $i, index($_, '~', $i) - $i) ; } ; # Testing for existing 'UTR' my @received = map by_split(), @input ; my @sentUTRs = ('ffsdahgdf', 'hjgfsdfghgaghsfd', $received[3], 'ppuiwdwsc', '4155dvcs7', $received[1]) ; my %sentUTRs ; @sentUTRs{@sentUTRs} = undef ; Benchmark::timethese(200000, { 'Split Grep' => sub { for (@input) { my $r = by_split() ; next if grep($r =~ /$_/, @sentUTRs) ; $r .= $r ; } ; }, 'Split Hash' => sub { for (@input) { my $r = by_split() ; next if exists $sentUTRs{$r} ; $r .= $r ; } ; }, 'Index Grep' => sub { for (@input) { my $r = by_index() ; next if grep($r =~ /$_/, @sentUTRs) ; $r .= $r ; } ; }, 'Index Hash' => sub { for (@input) { my $r = by_index() ; next if exists $sentUTRs{$r} ; $r .= $r ; } ; }, }); __DATA__ 0906928472847292INR~UTRIR8709990166~ 700000~INR~20080623~RC425484~ +IFSCSEND001 ~Remiter Details ~1000007 ~TEST R +TGS TRF7 ~ ~ + ~ ~RTGS~REVOSN OIL CORPORATION ~IOC +L ~09065010889~0906501088900122INR~ 7~ 1~ 1 0906472983472834HJR~UTRIN9080980866~ 1222706~INR~20080623~NI209960~ +AMEX0888888 ~FRAGNOS EXPRESS - TRS CARD S DIVIS +I~4578962 ~/BNF/9822644928 ~ + ~ ~ ~NEFT~REVOSN OIL + CORPORATION ~IO CL ~09065010889~0906501088900122INR~ 7 +~ 1~ 1 0906568946748922INR~ZP HLHLKJ87 ~ 1437865.95~INR~20080623~NI209969~HSB +C0560002 ~MOTOSPECT UNILEVER LIMITED ~1234567 + ~/INFO/ATTN: ~//REF 1104210 PLEASE FIND THE D +ET ~ ~ ~NEFT~REVOSN OIL CORPORATIO +N ~IOCL ~09065010889~0906501088900122INR~ 7~ 1~ 1 0906506749056822INR~Q08709798905745~ 5960.74~INR~20080623~NI209987~ + ~SDV AIR LINK REVOS LIMITED ~458ss4 +53 ~ ~ + ~ ~ ~NEFT~REVOSN OIL CORPORA +TION ~IOCL ~09065010889~0906501088900122INR~ 7~ 1~ + 1 0906503389054302INR~UTRI790898U0166~ 2414~INR~20080623~NI209976~ + ~FRAGNOS EXPRESS - TRS CARD S DIVIS +I~ ~/BNF/9826805798 ~ + ~ ~ ~NEFT~REVOSN OIL + CORPORATION ~IOCL ~09065010889~0906501088900122INR~ 7~ + 1~ 1

In reply to Re: Needed Performance improvement in reading and fetching from a file by gone2015
in thread Needed Performance improvement in reading and fetching from a file by harishnuti

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.