in reply to Search Efficiency

If I'm reading right, you're doing a linear search through the text file each time looking for a specific $userid.

If the file is sorted, and you're looking for values in the same order, then you can use seek and tell to move around in the file. Something like this:

my $last_pos = 0; # start at the beginning # later.... seek(FILE, $last_pos); # go to where I was while (<FILE>) { # start reading line at a time if (/$userid/) { # sure about the /o, BTW? $last_pos = tell(FILE); # remember this spot .... # other stuff
You can see why order is important: I'm assuming you can find the next record by advancing in the file.

If the requests are in random order, then it may be worthwhile to build your own index (we used to call these ISAM files back in the day ;-). In the beginning of your code, scan the file once, build a hash of positions. Then you can seek to any particular record. Something like this:

my %index; while (<FILE>) { $userid = split(...) # It's in there somewhere, right? $index{$userid} = tell(FILE}; } # later... seek(FILE, $index{$userid}); $_ = <FILE>; ($inv, $date, $amt) = split (...);
You'll have to fiddle with these to make sure you're seeking to the right spot in the file

Now the big suggestion: forget everything I just wrote! Get yourself a relational database and get rid of all this seek/tell stuff. If you've got that much data and you're doing random reads, there's just no point in writing your own ISAM stuff.

HTH

Replies are listed 'Best First'.
Re: Re: Search Efficiency
by treebeard (Acolyte) on Jul 11, 2001 at 19:31 UTC

    first, i had read that the /o would help wrt perl evaluating the variable, got the idea from the O'Reilly books...was I correct in my assumption?

    second. We are extracting a lot of data from an existing billing system, formatting it using Perl, and building a huge flat file that someone (not me :)) is going to load into sql server. (not my idea)

    We are using perl dbi scripts to extract the data from Oracle, then we process them using scripts based on the one above. It is not elegent, but we are not working with much time. Therefore our approach is

    1. Build tables

    2. Extract tables to file system (solaris)

    3. Format files

    4. Send the buggers out.

    I know that this getting away from my original question, but should we have merged the dbi call scripts and the format scripts?

      Well, the /o says "the variable inside this regular expression will always have the same value". Believing that to be true, Perl will not recompile the RE. If you change the value of the variable, the RE won't change. This may be good or bad depending on what you're doing: good if you really don't change it, because you'll save time on RE recompiles; bad if you do change it, because your matches will appear to behave strangely. ("I changed the variable, why didn't it match?")

      It's hard to answer your second question without a lot more knowledge about what you're doing. But my guess is that things would be a lot easier if you combine the query and reformatting into one program. The general rule: use the database for what it does best (data retrieval and manipulation) and Perl for what it does best (everything else ;-). There's no point in extracting data into a flat file and running back and forth over it: during the extract, get it in the form that makes it easiest to manipulate.

      That said: if you're loading the final result into SQL Server, have you considered using DTS (built in to SS 7 and above)? There are many painful things about it (programming data transforms in VBScript -- barf), but it's hard to beat for sheer speed in server-to-server data transfers.

      HTH

        Actually, if you install ActivePerl, you can use PerlScript with DTS in any of the places where you normally would use VBScript. I've been using it quite successfully with DTS in SQL Server 2000, and it's an immense improvement over VBScript (obviously). It also appears to be available in SQL Server 7, although I haven't actually used it with that version.

        In fact, if you execute the packages from your client machine, then I believe that ActivePerl only needs to be installed on the client, not the server. However, if you want to schedule any jobs which run on the server, then it needs to be on the server, as well.

      what kind of database is this getting loaded into ?
      Calm begets calm

        ms sqlserver