tyndyll has asked for the wisdom of the Perl Monks concerning the following question:
Hello everyone,
At the minute I'm trying to assemble a script that will take a DBF file and resort it according to the values in one column, then adding a counter to count those values (e.g. 1_3, 2_3, 3_3). This is my first "real" Perl programming experience so any help would be appreciated.
In this file there are a number of addresses, and these addresses are assigned a Mailsort code for posting. What I wish to do is to organise these codes into the least number first (i.e. where COUNT(CODE) is lowest). This is in a DBF file and there are about 25000 records in the file.
What I have so far is inputing the file, followed by extracting and counting the Mailsrot codes (sub extMSCode) and returning this in a hash. My problem now revolves around extracting the data out of the original file and inserting it into a new file in order. I am using XBase and DBD:Xbase (or maybe they're the same, I have no idea any more :( ) for this. The main problem is that it is INCREDIBLY slow (running for about 5mins yields about 60 - 70 addresses in the new file). I can dump the data out quicker using only the XBase module to write a count and then sorting in Excel.
Is this perhaps the best way to go or am I missing something. Is it slow mainly because its a large file and not a relational database? Any information would be appreciated
Thanks
Tyndyll
#!/usr/bin/perl -w use strict; use XBase; use DBI; #function prototypes sub extMSCode($); print "Opening Original file...\n"; my $dbf = new XBase "input.dbf" or die XBase->errstr; print "Creating New Table...\n"; my $new_dbf = XBase->create("name" => "output.dbf", "field_names" => [$dbf->field_names, "COUN +TED","SEQUENCE"], "field_types" => [$dbf->field_types, "C"," +N"], "field_lengths" => [$dbf->field_lengths,20 +,10], "field_decimals" => [$dbf->field_decimals, +undef,0] ) or die $dbf->errstr; my $dbh = DBI->connect("DBI:XBase:") or die $DBI::errstr; print "Extracting Mailsort Codes...\n"; my %ms_code = extMSCode($dbh); print "Writing data...\n\n"; my ($select_sql, $insert_sql); my $loop; my @fields = $dbf->field_names; foreach my $key (sort{$ms_code{$a}<=>$ms_code{$b}} keys(%ms_code)){ #Sort by Count $select_sql = "select * from input.dbf where MS_SELCODE = $key ORD +ER BY MS_SELCODE"; $insert_sql = qq{INSERT INTO output.dbf (".join(",",@fields).") VA +LUES ('?)}; my $sth_select = $dbh->prepare($select_sql); my $sth_insert = $dbh->prepare($insert_sql); $sth_select->execute(); while(my @data = $sth_select->fetchrow_array()){ my $temp = join("\',\'", @data); $sth_insert->bind_param(1,join("\',\'", @data)); $sth_insert->execute(); print $loop++." \n"; } } $dbh->disconnect; # Function Declarations and Code sub extMSCode($){ # Extracts Mailsort Codes and returns in a hash # with a count per code my $code_index = $dbh->prepare("select MS_SELCODE from input.dbf o +rder by MS_SELCODE") or die $dbh->errstr(); $code_index->execute() or die $code_index->errstr(); my %ms_code = (); while(my @data = $code_index->fetchrow_array()){ if(defined($ms_code{$data["MS_SELCODE"]})){ $ms_code{$data["MS_SELCODE"]}++; }else{ $ms_code{$data["MS_SELCODE"]} = 1; } } return %ms_code; }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Searching and sorting a large DBF file
by bart (Canon) on May 13, 2004 at 18:26 UTC | |
by tyndyll (Novice) on May 13, 2004 at 18:55 UTC | |
by exussum0 (Vicar) on May 13, 2004 at 20:03 UTC |