Perl scripts slowing down towards end of file processing

JPaul has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Perl scripts slowing down towards end of file processing by tachyon (Chancellor) on Jul 14, 2001 at 02:03 UTC
Mavericks suggestion to set the input record separator to 0x0c will allow you to read in one record at a time and avoid all that mucking around with single characters. It will no doubt speed your code considerably as well as simplifying it quite a lot. Let Perl work for you. Here is a small optimisation of your last two subs that will save a bit of time too. By using the 'x' operator rather than a while loop we add our whitespace pad in one operation. By using a regex to stip leading and trailing whitespace we save a lot of mucking about and let the regex engine do the work. This too will be much faster. `# Pad out with trailing whitespace as necessary sub pad { my ($data, $len) = @_; my $pad = $len - (length $data); $data .= " " x $pad unless $pad < 0; return $data; } # Get rid of leading/trailing whitespace sub normalise { my $value = shift; $value = s/^\s+\|\s+$//gm; return $value; }` [download] Hope this helps cheers tachyon s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print	[reply] [d/l]
Re: Perl scripts slowing down towards end of file processing by HyperZonk (Friar) on Jul 13, 2001 at 23:58 UTC
Well, if you are running on a Windoze box, you could be slamming the memory ... we'd have to see the code to say for sure, though. If that is the case, you may be forcing 'doze to use virtual memory (you know, hard disk) for data structure storage. On a fragmented drive, that incurs an even greater performance penalty of course. Can you update us with details about the platform, data structures, etc.? Update from HZ: It has been noted to me (thank you tye) that other systems can also penalize you for slamming memory. In any case, we need more details to help you. Please, sir, can we see your code?	[reply]
Re: Re: Perl scripts slowing down towards end of file processing by JPaul (Hermit) on Jul 14, 2001 at 00:12 UTC
Script is running on a linux machine running perl 5.6.0 There are no complex 'data structures' to speak of, except a bunch of vars containing integers for counting, and a few with the text removed from the statement file using substr(). None of the text removed is over 30 bytes in length. Each page is separated into an array (@page = split(/\n/, $page_contents)), the @page array contains 65 elements of less than 85 bytes (changes per page), and is overwritten each time we split() a new page. Did I miss anything? JP -- Alexander Widdlemouse undid his bellybutton and his bum dropped off --	[reply]
(tye)Re: Perl scripts slowing down towards end of file processing by tye (Sage) on Jul 14, 2001 at 00:36 UTC
That behavior does sound like what happens when you start using too much memory (non-linear reduction in performance). So you should check the memory consumption of the process during different points in the run (for example, via the "ps" command). Some file systems can also get much slower when they get nearly full so that is another possibility to check. Another possiblity is that your total (virtual) memory consumption hasn't grown too high but that you are jumping around too much in memory such that your "working set" is too large to fit in the available RAM. But, yes, we are all just guessing. Nothing in your description rings alarms bells to me about where the memory usage would be growing. - tye (but my friends call me "Tye")	[reply]
Re: Perl scripts slowing down towards end of file processing by JPaul (Hermit) on Jul 14, 2001 at 00:31 UTC
#!/usr/bin/perl -w use strict; my $records_out = 0; # Did they not tell us what to use? if (@ARGV != 2) { die("$0 - No file to process or output.\n"); } # Get total size, in bytes, of file for percentage counter my @stat = stat($ARGV[0]); my $total_bytes = $stat[7]; $\| = 1; # Autoflush output open(IN, $ARGV[0]) \|\| die("$0 - Cannot open $ARGV[0].\n"); open(OUT, ">$ARGV[1]"); print "Dumping Account holder name and Account number from $ARGV[0]... +\n"; print "Outputting to $ARGV[1]\n"; my $read_bytes = 0; my $page_contents = ''; my $read_pages = 1; for(;;) { my $EOF = read(IN, my $char, 1); last if ($EOF == 0); if (substr($char, 0, 1) =~ /\x0c/) { my @temp = split(/\n/, $page_contents); # Split lines +up my $line_count = @temp; next if $line_count < 2; # Get name, address, city, state & zipcode from page my $name = substr($temp[58], 5, 30); my $addr = substr($temp[59], 5, 30); my $addr2 = substr($temp[60], 5, 30); my ($city, $state) = split(/\,/, substr($temp[61], 5, +20)); my $zip = substr($temp[61], 26, 5); $addr = &normalise($addr); $addr2 =&normalise($addr2); $city = &normalise($city); $state = &normalise($state); $zip = &normalise($zip); &format($name, $addr, $addr2, $city, $state, $zip); $read_pages++; $page_contents = ''; } $page_contents .= $char; # Push char on +to page # Do percentage display $read_bytes++; my $percent = $read_bytes / $total_bytes * 100; # Work out per +centage $percent = sprintf("%.2f", $percent); # Set to 2 dp print "$percent%\r"; } $read_pages++; close(IN); close(OUT); print "\n-> Read $read_pages pages, and dumped $records_out records ou +t.\n"; exit(0); # Format data into ANCOA import standard sub format { my ($name, $addr, $addr2, $city, $state, $zip) = @_; my $record = "1"; # Record Type $record .= &pad($name, 35); $record .= &pad($addr, 30); $record .= &pad($addr2, 30); $record .= &pad($city, 30); $record .= &pad($state, 2); $record .= &pad($zip, 5); $record .= &pad(" ", 342); # Empty Optional Customer Info my $len = length($record); print OUT "$record"; $records_out++; } # Pad out with trailing whitespace as necessary sub pad { my ($data, $len) = @_; while(length($data) < $len) { $data .= " "; } return $data; } # Get rid of leading/trailing whitespace sub normalise { my ($value) = @_; if (defined $value) { for(my $i = 0; $i <= length($value); $i++) { if (substr($value, $i, 1) ne " ") { $value = substr($value, $i, length($value)); last; } } for(my $i = length($value); $i >= 0; $i--) { next if substr($value, $i, 1) eq ""; if (substr($value, $i, 1) ne " ") { $value = substr($value, 0, $i + 1); last; } } } return $value; } [download] JP	[reply] [d/l]
Re: Re: Perl scripts slowing down towards end of file processing by maverick (Curate) on Jul 14, 2001 at 01:11 UTC
You might want to considered changing your input loop to look something like this. `# tell perl to make the input record separator the character with hex +value '0c' $/ = pack('c',0x0c); while(<IN>) { my @temp = split(/\n/,$_); # rest of processing }` [download] This would save you the byte by byte read and the string concatination for every record. Should be quite a bit faster. /\/\averick	[reply] [d/l]
Re: Re: Re: Perl scripts slowing down towards end of file processing by JPaul (Hermit) on Jul 14, 2001 at 01:40 UTC
Well hell, What took 11 minutes to process (byte-by-byte) now takes 16 seconds. I crap you not. I'd better remember that, thanks Mav. JP -- Alexander Widdlemouse undid his bellybutton and his bum dropped off --	[reply]
Re: Perl scripts slowing down towards end of file processing by cyocum (Curate) on Jul 14, 2001 at 00:06 UTC
Are you using my variables that get garbage collected regularly? If you are putting stuff in like a local variable then it gets stored when you are in a different scope then it comes back when you get out of that scope. The upshot is that you could be storing things that you don't mean to be. Without seeing the code it is all idle speculation.	[reply]
Re: Re: Perl scripts slowing down towards end of file processing by JPaul (Hermit) on Jul 14, 2001 at 00:20 UTC
Would it matter? I'm using strict, there are a few small routines that would have the current vars stored, but anything stored is small and the subs are very small and pointless too (padding fields out with spaces, removing lead/trailing spaces). Even so, why would having things stored out of scope make much of a difference? JP -- Alexander Widdlemouse undid his bellybutton and his bum dropped off --	[reply]