in reply to Loading file into memory

If you want to use arrays to save space, it can be helpful to set up constants for your indexes.

use constant ID => 0; use constant FIRST_NAME => 1; use constant INITIAL => 2; use constant LAST_NAME => 3; # Processed data would look like this: my @data = ( # ID FNAME MI LNAME [qw( dramaya Daniel R Amaya )], [qw( bjsmith Bob J Smith )], [qw( abjones Angela B Jones )], ); # Access your data: foreach my $entry (@data) { my $id = $entry->[ID]; my $fname = $entry->[FIRST_NAME]; # or you could do: my ( $first, $mi, $last, $id ) = @{$entry}[ FIRST_NAME, INITIAL, LA +ST_NAME, ID ]; }

Using constant array indexes can be a nice help for readability.


TGI says moo

Replies are listed 'Best First'.
Re^2: Loading file into memory
by walkingthecow (Friar) on Aug 05, 2008 at 20:56 UTC
    I appreciate all the help guys! However, I think I should clarify some...

    First, I made a typo in my first post. Each record in the file actually looks like this:
    ID      NAME      PLK      NUM1      NUM2
    daamaya:Daniel R. Amaya,PLK,0000056789,ED97865:10:25:blah:blah

    Now, I need every field of that somehow loaded into memory. Then I want to use the $id and $name fields to search through my CSV file. If $name is found (e.g. Daniel Amaya), then I want to print the PLK,num1,num2 from the CVS file to the screen (or a file). I just figure that is a faster way to use one file's fields to search another. I was trying to do it by opening file line-by-line and then searching CSV, but it was super slow.

      You want to process the CSV file and put into a database or some sort. Either a DB file or SQLite. If you use SQLite, make sure you index the fields you'll be searching against. If you use a DB file, (such as the Berkeley DB) you'll want to think carefully about your data structure. MLDBM is also worth looking at. Your query times will improve dramatically.

      So your flow should look something like this:

      my $dbh = Read_Huge_File_Into_DB( $huge_file_path ); my @customers = Process_Customer_Information_File( $dbh, $file_path ); Print_Report(\@customers); sub Process_Customer_Information_File { my $dbh = shift; my $file = shift; open( $info, '<', $file ) or die "Uh oh $!"; my @customers_found; while ( my $line = <$info> ) { my $customer_data = ParseCustomerData($line); my $name = $customer_data->[NAME]; if ( Customer_Found( $dbh, $name ) ) { push @customers_found, $customer_data; } } return @customers_found; }

      If it were me, I'd use SQLite.


      TGI says moo