in reply to Re^2: HTML::Tidy - uses up RAM at crazy rate
in thread HTML::Tidy - uses up RAM at crazy rate

Thanks. This version runs ok - I tried it on directory containing 77 html files, and it was pretty quick. What quantity of data are you handling? How big is the memory footprint?

Just a couple other suggestions:

Something like this:
... unless ( @ARGV == 2 and -d $ARGV[0] and -d $ARGV[1] ) { die "Usage: $0 input/path output/path\n" } my ( $indir, $outdir ) = @ARGV my @files = glob "$indir/*.html"; die "No html files found in $indir\n" unless ( @files ); ... for my $file ( @files ) { ... ( my $ofile = $file ) =~ s{$indir}{$outdir}; open OUT, '>', $ofile or die "$!"; ... }

Replies are listed 'Best First'.
Re^4: Perl tidy - uses up RAM at crazy rate
by Anonymous Monk on Mar 14, 2016 at 04:30 UTC

    Thanks for your suggestions on format changes and code improvements - I'll work on putting them in. Regarding size I'm running the script of about 3,000 files. OK at first much slower as it continues. Any ideas? Thanks!

      Iterate over the list of files, then process each one in a child using system
      So you've tried two versions - one with HTML::Tidy->new outside the file loop, and one with it inside the loop? Was there actually no difference in behavior?

      I notice that there's a "clear_messages" function. Have you tried calling that at the end of each iteration? (I assume you've read the man page for this module...)

      I forgot to include that I've got 8gigs of DDR3 RAM and a 1.5GhZ Intel Core i5 processor. Thanks.