A200560 has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I'm using Tie::File in order to manipulate big CSV files. In particular I need to insert some lines when some conditions are satisfied.

Here my usage:
... tie @array, 'Tie::File', "$inputfile"; ... for (@array){ ... splice @array, $pos, 0, "$lottorec"; ...


I have a great performance problem with text files from about 5MB . In particular when it performs "splice" the system slowdown.

I don't know why I obtain this poor performance, I read in the documentation:

" C<push>, C<pop>, C<shift>, C<unshift>, and C<splice> cannot be deferred. When you perform one of these operations, any deferred data is written to the file and the operation is performed immediately. This may change in a future version. "

Please, can you tell me if Tie::File is ok for huge file modifications and if you know other ways to insert lines between CSV files in a smart (and simple) way.

Thanks.

Replies are listed 'Best First'.
Re: Tie::File performance issue
by ikegami (Patriarch) on Nov 07, 2007 at 18:34 UTC

    If I'm not mistaken, the line you quoted from the docs means Tie::File reads the rest of the file and writes it back out every time you do a splice.

    5MB isn't particularly big. Since you end up reading the entire file anyway, you might be better off loading the entire file into an array, making changes to the array, then dumping the array back out if changes were made.

    Update: Here's something to get you started

    my @file; my $changed; { open(my $fh, '<', $fn) or die("Unable to open file \"$fn\": $!\n"); while (<$fh>) { if (...[ need to delete this line ]...) { $changed = 1; } else { push @file, $_; } if (...[ need to insert a line after this line ]...) { push @file, ...; $changed = 1; } } } if ($changed) { open(my $fh, '>', $fn) or die("Unable to open file \"$fn\": $!\n"); print $fh @file; }

      If you use two files, you could even avoid the memory requirement.

      my $changed; { open(my $fh_in, '<', $fn_in ) or die; open(my $fh_out, '>', $fn_out) or die; while (<$fh_in>) { if (...[ need to delete this line ]...) { $changed = 1; } else { print $fh_out $_; } if (...[ need to insert a line after this line ]...) { print $fh_out ...; $changed = 1; } } close($fh_in); close($fh_out) or die; } if ($changed) { rename($fn_out, $fn_in) or die; } else { # Preserve file's mtime. unlink($fn_out) or warn; }
        thanks for your help, very useful. Sometimes it is better to implement its own methods instead of use already implemented modules (which doesn't scale...).

        Thanks