in reply to Re^2: Needed Performance improvement in reading and fetching from a file
in thread Needed Performance improvement in reading and fetching from a file
There are many options that may help solve your problem. For a start, if it is the same file every 15 minutes then you can remember (possibly in a configuration file) where you had processed up to last time and continue from that point this time - no searching required at all!
The absolute standard fix to your immediate problem is to store your payment numbers (keys) in a hash then use a very fast constant time lookup (that's what hashes do when you give them a key and ask for a value) for your match check. Consider:
use strict; use warnings; my @refnos = (); my $old = <<OLD; UTRIR8709990166 ZPHLHLKJ87 OLD my %oldPayments; open my $payments, '<', \$old; %oldPayments = map {$_ => undef} grep {chomp; length} <$payments>; close $payments; print "Reading UTR Payment numbers \n"; while (<DATA>) { chomp; my @data = split (/~/, $_); (my $utr = uc $data[1]) =~ s/\s*//g; next if exists $oldPayments{$utr}; $oldPayments{$utr} = $data[1]; print "Payment $utr received of $data[2]\n"; } open $payments, '>', \$old; print $payments join "\n", sort keys %oldPayments, ''; close $payments; print "New payments are:\n "; print join "\n ", grep {defined $oldPayments{$_}} sort keys %oldPaym +ents; __DATA__ 0906928472847292INR~UTRIR8709990166~ 700000~INR~20080623~RC425484~ +IFSCSEND001 ~Remiter Details ~1000007 ~TEST R +TGS TRF7 ~ ~ + ~ ~RTGS~REVOSN OIL CORPORATION ~IOC +L ~09065010889~0906501088900122INR~ 7~ 1~ 1 0906472983472834HJR~UTRIN9080980866~ 1222706~INR~20080623~NI209960~ +AMEX0888888 ~FRAGNOS EXPRESS - TRS CARD S DIVIS +I~4578962 ~/BNF/9822644928 ~ + ~ ~ ~NEFT~REVOSN OIL + CORPORATION ~IO CL ~09065010889~0906501088900122INR~ 7 +~ 1~ 1 0906568946748922INR~ZP HLHLKJ87 ~ 1437865.95~INR~20080623~NI209969~HSB +C0560002 ~MOTOSPECT UNILEVER LIMITED ~1234567 + ~/INFO/ATTN: ~//REF 1104210 PLEASE FIND THE D +ET ~ ~ ~NEFT~REVOSN OIL CORPORATIO +N ~IOCL ~09065010889~0906501088900122INR~ 7~ 1~ 1 0906506749056822INR~Q08709798905745~ 5960.74~INR~20080623~NI209987~ + ~SDV AIR LINK REVOS LIMITED ~458ss4 +53 ~ ~ + ~ ~ ~NEFT~REVOSN OIL CORPORA +TION ~IOCL ~09065010889~0906501088900122INR~ 7~ 1~ + 1 0906503389054302INR~UTRI790898U0166~ 2414~INR~20080623~NI209976~ + ~FRAGNOS EXPRESS - TRS CARD S DIVIS +I~ ~/BNF/9826805798 ~ + ~ ~ ~NEFT~REVOSN OIL + CORPORATION ~IOCL ~09065010889~0906501088900122INR~ 7~ + 1~ 1
Prints:
Reading UTR Payment numbers Payment UTRIN9080980866 received of 1222706 Payment Q08709798905745 received of 5960.74 Payment UTRI790898U0166 received of 2414 New payments are: Q08709798905745 UTRI790898U0166 UTRIN9080980866
Of course I've used a variable as a file to save needing to use a disk based file for the example, but in practice you would use a disk based file of course.
However, if your data set gets very large (millions of entries perhaps) you should seriously consider using a database instead of a flat file if at all possible.
|
|---|