in reply to RegEx on 4MB file consumes of 2GB of ram before windows shuts it down (Memory Leak in 5.8.2)

Here is how you should probably do it by using a lookup hash for the substitutions and the input record separator.

#! perl # generate a single RE/hash for replacement open (PRIFILE, '<C:\temp.txt') or die; my %lookup; while (<PRIFILE>){ chomp; ($Cue, $Sound, $Pri) = split ("\t"); $lookup{$Sound} = $Pri; } close PRIFILE; my $re = join '|', keys %lookup; $re = qr/Name\s*=\s*($re)\s*;/; open (XAPFILE, 'C:\Documents and Settings\Nick\Desktop\040408 Work Fil +es\stranger.xap') or die; open (OUTFILE, '>C:\stranger.xap') or die; # set input record separator so we read a record at a time local $/ = "Sound\n{"; while (my $record = <XAPFILE>){ if ( $record =~ m/$re/ ) { my $sound = $1; my $delta_pri = $lookup{$sound}; # change existing pri unless ( $record =~ s/Priority\s*=\s*\d+/Priority=$delta_pri/ +) { # could not change so need to add $record =~ s/$sound/$sound\nPriority=$delta_pri;\n/; } } print OUTFILE $record; } close XAPFILE; close OUTFILE;

cheers

tachyon

  • Comment on Re: RegEx on 4MB file consumes of 2GB of ram before windows shuts it down (Memory Leak in 5.8.2)
  • Download Code

Replies are listed 'Best First'.
Re: Re: RegEx on 4MB file consumes of 2GB of ram before windows shuts it down (Memory Leak in 5.8.2)
by Ardemus (Beadle) on Apr 13, 2004 at 08:17 UTC
    I'm actually working on a rather complex module to handle this type of thing. I guess I gave the impression that a sound entry was the only type of data object in the file. In fact it is a self referential nested tree with many types of objects...

    :)

    I could, however, slurp a line at a time until the line was ^\s*Sound$ then make sure the next line was correct, check if the name matched (and steal the white space). Finally I'd check the next line and either add a priority line or update it. I could write each line right back out to the output file as I go.

    That's a much better approach and much less prone to bugs (and it would work around the reg-ex memory leak in 5.8.2).

    Thanks

      You don't parse self referential nested trees with simple REs. You parse them with a parser (typically recursive descent) and then work over the nodes.

      Regardless of that the code I supplied does the same as the regexes you were trying to use in your original post except a lot more efficiently. You seem to have missed what it does. Also you seem happy to blame a memory leak in the RE engine. Besides the fact that your REs are really rather badly written I pointed out you have an extra set of braces:

      while(<>) { { # surplus to requirements

      this may well be creating a closure.

      cheers

      tachyon