comment on

Good afternoon,

I have to 'standardize' 1 million-ish records with demographic data that sit (alas) in SQL server.

The 'old-way' processes 6000 records/per hour on a good day.

Because I know Perl is the right way to go, but think that SQL server is a cow, I'm using this baby-Perl code within an activeX script within a DTS package to work with the list of words to find and replace (250 lines. example: ADMINISTRATION=ADMIN) and text file of demographics (1000000 lines example: C00006 DR. TOM SMITH VETERANS ADMINISTRATION BUFFALO NEW YORK) outside of the db environment. It's still too slow, though.

Target speed: 1 million records in an hour!


#***********************
# Perl ActiveX Script
#***********************
sub Main()
{
use Win32; #allows for MsgBox functionality
use OLE;
$file1 = "c:/foo/sub_list.txt"; 
$file2 = "c:/foo/cust_names_city.txt";
$file3 = ">c:/foo/cust_names_city_new.txt";

open (SUBLIST, $file1) || die MsgBox("Can't open file"); 
open (CUSTLIST, $file2) || die MsgBox("Can't open file");
open (NEWCUSTLIST, $file3) || die MsgBox("Can't open file");
@pairs = <SUBLIST>;
@names = <CUSTLIST>;

foreach $pair (@pairs) 
{
($key,$value) = split (/=/, $pair);
  chomp ($value);
 
   foreach $name (@names) 
   {
    $name =~s /\s$key/$value/ig;
    @newrecords = (@newrecords,"$name");
   }
}

foreach $newrecord (@newrecords)
{
 print NEWCUSTLIST $newrecord;
}
close (SUBLIST);
close (CUSTLIST);
close (NEWCUSTLIST);
 return 0;     # DTSTaskExecResult_Success;
}

******************************************
[download]

This humble and frustrated initiate thanks you in advance for your help!

guopan@earthlink.net

In reply to Simply Too Slow Find and Replace by guopan

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.