Hello everyone,

At the minute I'm trying to assemble a script that will take a DBF file and resort it according to the values in one column, then adding a counter to count those values (e.g. 1_3, 2_3, 3_3). This is my first "real" Perl programming experience so any help would be appreciated.

In this file there are a number of addresses, and these addresses are assigned a Mailsort code for posting. What I wish to do is to organise these codes into the least number first (i.e. where COUNT(CODE) is lowest). This is in a DBF file and there are about 25000 records in the file.

What I have so far is inputing the file, followed by extracting and counting the Mailsrot codes (sub extMSCode) and returning this in a hash. My problem now revolves around extracting the data out of the original file and inserting it into a new file in order. I am using XBase and DBD:Xbase (or maybe they're the same, I have no idea any more :( ) for this. The main problem is that it is INCREDIBLY slow (running for about 5mins yields about 60 - 70 addresses in the new file). I can dump the data out quicker using only the XBase module to write a count and then sorting in Excel.

Is this perhaps the best way to go or am I missing something. Is it slow mainly because its a large file and not a relational database? Any information would be appreciated

Thanks

Tyndyll

#!/usr/bin/perl -w use strict; use XBase; use DBI; #function prototypes sub extMSCode($); print "Opening Original file...\n"; my $dbf = new XBase "input.dbf" or die XBase->errstr; print "Creating New Table...\n"; my $new_dbf = XBase->create("name" => "output.dbf", "field_names" => [$dbf->field_names, "COUN +TED","SEQUENCE"], "field_types" => [$dbf->field_types, "C"," +N"], "field_lengths" => [$dbf->field_lengths,20 +,10], "field_decimals" => [$dbf->field_decimals, +undef,0] ) or die $dbf->errstr; my $dbh = DBI->connect("DBI:XBase:") or die $DBI::errstr; print "Extracting Mailsort Codes...\n"; my %ms_code = extMSCode($dbh); print "Writing data...\n\n"; my ($select_sql, $insert_sql); my $loop; my @fields = $dbf->field_names; foreach my $key (sort{$ms_code{$a}<=>$ms_code{$b}} keys(%ms_code)){ #Sort by Count $select_sql = "select * from input.dbf where MS_SELCODE = $key ORD +ER BY MS_SELCODE"; $insert_sql = qq{INSERT INTO output.dbf (".join(",",@fields).") VA +LUES ('?)}; my $sth_select = $dbh->prepare($select_sql); my $sth_insert = $dbh->prepare($insert_sql); $sth_select->execute(); while(my @data = $sth_select->fetchrow_array()){ my $temp = join("\',\'", @data); $sth_insert->bind_param(1,join("\',\'", @data)); $sth_insert->execute(); print $loop++." \n"; } } $dbh->disconnect; # Function Declarations and Code sub extMSCode($){ # Extracts Mailsort Codes and returns in a hash # with a count per code my $code_index = $dbh->prepare("select MS_SELCODE from input.dbf o +rder by MS_SELCODE") or die $dbh->errstr(); $code_index->execute() or die $code_index->errstr(); my %ms_code = (); while(my @data = $code_index->fetchrow_array()){ if(defined($ms_code{$data["MS_SELCODE"]})){ $ms_code{$data["MS_SELCODE"]}++; }else{ $ms_code{$data["MS_SELCODE"]} = 1; } } return %ms_code; }

In reply to Searching and sorting a large DBF file by tyndyll

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.