comment on

Dear Monks
Needed your expert advice in my below code fragment( extracted from my main program a portion to show where iam hit with performace).
i suspect there will be a better way to do below more faster and efficiently in perl
iam just reading a large file (in below its a small portion is in __DATA__ ) line by line and splitting it and taking the second field consistently for all lines
each line starts with 09 as below and splitting is done based on ~ character

#!/usr/bin/perl
use strict;
use warnings;
my @refnos = ();

print "Reading UTR Payment numbers \n";

while(<DATA>){
        chomp;
        my @data = split(/~/,$_);
        push(@refnos,$data[1]);
        # Below grep causing issue
        next if (grep($data[1] =~ /$_/, @sentUTRs)); 
        # file referenceno.txt will have previously sent $data[1] entr
+ies and that file will be read into array @sentUTRs
        # if $data[1] matches any entry in array @sentUTRs, then we wi
+ll skip further processing
        # otherwise , consider this line to do further processing 
        # update $data[1] in file referenceno.txt
        # referenceno.txt might have 5K or more entries , which are ta
+ken into @sentUTRs for comparision with $data[1] in every iteration.
}

print "Following are the fetched reference no's \n";
foreach (@refnos){
        print $_,"\n";
}       
        
__DATA__
0906928472847292INR~UTRIR8709990166~     700000~INR~20080623~RC425484~
+IFSCSEND001                       ~Remiter Details ~1000007   ~TEST R
+TGS TRF7                     ~                                   ~   
+                                ~ ~RTGS~REVOSN OIL CORPORATION   ~IOC
+L  ~09065010889~0906501088900122INR~         7~         1~ 1
0906472983472834HJR~UTRIN9080980866~    1222706~INR~20080623~NI209960~
+AMEX0888888                       ~FRAGNOS EXPRESS - TRS CARD S DIVIS
+I~4578962   ~/BNF/9822644928                    ~                    
+               ~                                   ~ ~NEFT~REVOSN OIL
+ CORPORATION   ~IO    CL  ~09065010889~0906501088900122INR~         7
+~         1~ 1
0906568946748922INR~ZP HLHLKJ87 ~ 1437865.95~INR~20080623~NI209969~HSB
+C0560002                       ~MOTOSPECT UNILEVER LIMITED ~1234567  
+ ~/INFO/ATTN:                        ~//REF 1104210 PLEASE FIND THE D
+ET  ~                                   ~ ~NEFT~REVOSN OIL CORPORATIO
+N   ~IOCL  ~09065010889~0906501088900122INR~         7~         1~ 1
0906506749056822INR~Q08709798905745~    5960.74~INR~20080623~NI209987~
+                                  ~SDV AIR LINK REVOS LIMITED ~458ss4
+53  ~                                   ~                            
+       ~                                   ~ ~NEFT~REVOSN OIL CORPORA
+TION   ~IOCL  ~09065010889~0906501088900122INR~         7~         1~
+ 1
0906503389054302INR~UTRI790898U0166~       2414~INR~20080623~NI209976~
+                                  ~FRAGNOS EXPRESS - TRS CARD S DIVIS
+I~          ~/BNF/9826805798                    ~                    
+               ~                                   ~ ~NEFT~REVOSN OIL
+ CORPORATION   ~IOCL  ~09065010889~0906501088900122INR~         7~   
+      1~ 1
[download]

i have around 20K or more records , reading each line and splitting and getting second field is a really a performance issue for me.
i suspect, since iam reading second field consistently , what is the point of splitting and ignoring rest of all fields, is there any option of building regular expression to get the second field pattern from all lines.
excuse me if iam wrong , its just my thought :o-)
pls suggest
i have updated my question, the problem is causing by grep i.e. when i grep an array containing 5k to 10K words, it is taking time overall.
Any other way of efficiently searching file instead of taking everyting into array (memory)

In reply to Needed Performance improvement in reading and fetching from a file by harishnuti

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.