in reply to Reducing Memory Usage

i don't know what you need to do with your file exactly, except sorting. you could give Tie-File a try. With it you can access every single line in the file, but not the whole file is loaded into the memory.

ciao knoebi

Replies are listed 'Best First'.
Re^2: Reducing Memory Usage
by PerlingTheUK (Hermit) on Jul 16, 2004 at 07:42 UTC
    I have looked at that option briefly but as I have had problems with running the substr function, I believe it is not a practical alternative.
    I need to compare the 3rd to 11th character (location) of each line with every other line, if these characters are the same, I need to compare a time (12th to 15th) character and sort all of the lines according to that time. I also need to convert the time that is in a strange format every time i read it, so this data preparation is quite time consuming, and I do not want to run it ever single time I need the value.
    Ciao PerlingTheUK
      The facts I've found so far:
      • every line is 80 characters exactly
      • the lines need to be grouped by the field at offsets 2..10
      • the groups need to be sorted by the field at offsets 11..14
      • this second field is a coded time that needs to be decoded

      A possible strategy would be to first 'index' the file, by reading it line by line;

      (untested code follows)
      my %index; my $line=0; while (<FILE>) { my ($location,$time) = /^..(.{9})(.{4})/; push @{$index{$location}},[$time,$line]; $line++; }

      This would results in a hash keyed on the 'location', with the value being a reference to an array with contain the info you need to sort the lines. This seems to be the minumum amount of info needed to determine the sort order.

      The next step is to sort the arrays by the time values, you've stored, and fetch the lines in order from the file:

      (untested code again)
      for my $location (keys %index) { my @sorted = sort { $a->[0] <=> $b->[0]} @{$index{$location}}; for my $entry (@sorted) { seek FILE, 81 * $entry->[1], 0; read FILE, $line, 80; print $line,"\n"; } }

      This method should be very memory efficient I think, and not too slow either; the biggest slowdown is probably the seeking around in the file.

      This method works because we know the lengths of records. If we don't we could use the tell function before we read a line, to also store the exact start position of the line in the index...

        Here's a better strategy:
        1. You're grouping stuff. Ok - build a temporary file that has those fields grouped. DO THIS FIRST
        2. You're sorting stuff. Ok - take the file from 1. and sort it into another temporary file. Preferably, you would use the Unix sort command, but slurp'n'sort'n'glop works just fine, too.
        3. Now, you should be able to work with the second temporary file because it's been massaged to what you want.

        There is never a rule that says you have to work with the crapola you were given! Instead of massaging in RAM, massage on the hard-drive!

        ------
        We are the carpenters and bricklayers of the Information Age.

        Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

        I shouldn't have to say this, but any code, unless otherwise stated, is untested