deequeue has asked for the wisdom of the Perl Monks concerning the following question:

Howdy there,

I am having an issue dereferencing a series of arrays (which are all in this case diifferent files taken in line by line) inside a subroutine. I do this for several arrays sequentially by applying a subroutine several times. The third array I dereference is rather large (constructed from a 100 Mb txt file), but only 25X as large as the next largest file. Unfortunately, this operation seems to take ~460 seconds, whereas the same operation for any other file takes at most between 1-2 seconds, as can be seen by my results (posted below). I have to do this many times repeatedly throughout my code and cannot really afford this huge lag time. Is there any way I can speed this up? Is there a reason why this is hanging? It should be noted that while it is hanging that the memory usage for perl slowly crawls from ~500,000k to ~700,000k. Thanks for your help!

Relevant code:

sub genLookupTable{ #begin subroutine genLookupTable #ArrayRefs is a global variable wherein the references to the arrays o +f interest are stored my $lengRefs = @ArrayRefs for(my $i = 0; $i<$lengRefs; $i++){ print time()." is the time before\n"; my @lines = @{$ArrayRefs[$i]}; print time()." is the time after\n"; my $why = @lines; print " The length of lines for $i is $why\n"; …

Relevant Results:


C:\Perl\bin>perl newIntermediate2.pl
The time is now 1244797376
files are opening...
files have opened and been put into arrays and those arrays have been made into references and put into an array...
The time is now 1244797384

Those arrays are being processed...

I am now in the subroutine.
1244797425 is the time before dereferencing array 1
1244797425 is the time after dereferencing array 1
The length of lines for 0 is 87544
1244797426 is the time before dereferencing array 2
1244797426 is the time after dereferencing array 2
The length of lines for 1 is 21573
1244797426 is the time before dereferencing array 3
1244797892 is the time after dereferencing array 3
The length of lines for 2 is 2250393
1244797893 is the time before dereferencing array 4
1244797893 is the time after dereferencing array 4
The length of lines for 3 is 12329
1244797893 is the time before dereferencing array 5
1244797893 is the time after dereferencing array 5
The length of lines for 4 is 83274
1244797893 is the time before dereferencing array 6
1244797893 is the time after dereferencing array 6
The length of lines for 5 is 66514
1244797893 is the time before dereferencing array 7
1244797893 is the time after dereferencing array 7
The length of lines for 6 is 7998
1244797893 is the time before dereferencing array 8
1244797893 is the time after dereferencing array 8
The length of lines for 7 is 2453

Replies are listed 'Best First'.
Re: Slow Dereferencing and Not Sure Why
by Corion (Patriarch) on Jun 12, 2009 at 07:40 UTC

    You didn't show us the relevant part of your code, as the output you show is not generated by your code.

    One thing though:

    my @lines = @{$ArrayRefs[$i]};

    This line will create a (shallow) copy of the whole array, which will consume memory and which will take time. If that's not what you want, because you're not actually modifying the stuff in @lines, then you can replace @lines by $lines (for example) and modify your remaining code to use @$lines appropriately, and you should gain at least a reduction in memory used.

    You might also want to take a look at the memory consumption of your code. I think every array element uses up at least about 32 bytes plus the length of the string it contains, so your machine might or might not simply run out of memory.

    Most of the efficiency gains will be made through the choice of an appropriate algorithm, so if you show us what your program is actually doing and how it processes the input, maybe we can find a way to make the program do less work overall instead of optimizing how quickly it does its work for a single line.

      Thanks very much for the tips. As both you and JavaFan suspected, I did not actually have to make a second copy and simply was dereferencing because I had a block and thought I had to dereference use it, which if it actually were the case would make referencing and dereferencing of minimal value .

      Thanks again for your help despite the incompletenes of the query.

Re: Slow Dereferencing and Not Sure Why
by JavaFan (Canon) on Jun 12, 2009 at 10:02 UTC
    You're copying over 2 million strings in the subroutine. Perl needs to allocate a lot of memory for that (in small chunks). Memory allocation doesn't scale very well (specially not when the process needs to swap). Do you need all the lines to be copied? Do you actually need to store each line of every file in memory first, before you do any processing?