Replacement for substr(Data::Dumper($x), 0, 4000)

mje has asked for the wisdom of the Perl Monks concerning the following question:

I've been debugging some code which uses a huge amount of memory on a machine where memory is short and the processes involved cause the system to swap and performance to degrade awefully. After much investigation the problem code appears to be:

    eval {
        $db->execProc(
            $p_job_finished,
            {DieOnError => 1},
            $job->{jobid}, $job->{jobsts},
            (exists($job->{jobnative}) ? $job->{jobnative} : undef),
            (exists($job->{joberr}) ?
                 substr($job->{joberr}, 0, DB_MAX_CHRS_NOT_LONG) : und
+ef),
            (defined($job->{jobinterr}) ?
                 substr($job->{jobinterr}, 0, DB_MAX_CHRS_NOT_LONG) : 
+undef),
            $repost, $clientref,
        substr(Dumper($results), 0, DB_MAX_CHRS_NOT_LONG));
        1;
    };
[download]

where $db is a wrapper around DBI adding an execProc method which calls a procedure which inserts some data into a table and DB_MAX_CHRS_NOT_LONG is 4000. When this code was first written the $results array ref would not be very large and a Dumper of it was always shorter then 4000 and the substr was simply a protection against an attempt to insert too much into the column. However, changes to the system now mean the $results array ref can be very large (sometimes as big as 16Mb when dumped using Data::Dumper). When the Dumper above is run the resident set size rises to around 420Mb from 100Mb and this is causing the problem and worse only the first 4K of the dump is used anyway.

Only a sample of the $results array ref is required (at most 4K worth) and the resulting dump is only for debugging/auditing and does not need to be converted back to a Perl structure. The obvious thing to do it to stop creating the dump string at 4K but Data::Dumper does not seem to support that. I tried Data::Dump::Partial but it is awefully slow (less than 1 per second compared with Dumper at 60 a second on the same, smallish array reference). $results only contains arrays of around 3 or 4 levels deep and simple string/number scalars? Other than roll my own (which is a possibility) is there anything else that produces a readable string version of a partial Perl structure?

Comment on Replacement for substr(Data::Dumper($x), 0, 4000) Download Code

Replies are listed 'Best First'.
Re: Replacement for substr(Data::Dumper($x), 0, 4000) by flexvault (Monsignor) on Jan 17, 2012 at 14:37 UTC
mje, Maybe I'm missing something, but couldn't you just copy the beginning of the array to a new array, and then call Dumper with that array reference. Don't know what you would do if you have a hash???? `use strict ; use warnings; use Data::Dumper; use Devel::Size qw(total_size); # Create a lot of data my @newdata = (); $newdata[10_000] = ""; for my $i ( 0 .. 10_000 ) { $newdata[$i] = "New" x $i; } print "\n\t\@newdata Size: ",total_size(\@newdata), "\n\n"; ## 150 +_345_584 on my system my @shortarray = (); for my $k ( 0 .. $#newdata ) { $shortarray[$k] = $newdata[$k]; if ( total_size(\@shortarray) > 400 ) { last; } ## use Maxchar +s } print Dumper( \@shortarray ); 1;` [download] That way you keep the essence of the script! Good Luck! "Well done is better than well said." - Benjamin Franklin	[reply] [d/l]
Re^2: Replacement for substr(Data::Dumper($x), 0, 4000) by mje (Curate) on Jan 17, 2012 at 16:03 UTC
The data is a basically `[[a,b],[c,d,e,...]]` [download] where c, d and e etc could contain further array references usually no deeper than 3 but the spec for this data does not restrict it to 3. I can basically copy the first array ref and then take N of the c,d,e etc where hopefully the result when Dumped is not longer than 4K. However, it is unlikely I can get an N which provides just over 4K (reducing the work in Dumper) as c, d, e can contain other scalars of varying lengths. An ideal would be picking N which gave just over 4K when Dumped. However, although this could truncate the dumped string less than 4K when N was too small it is still a viable solution as we only need the data accurate up to the truncation and its only 4K because that is the column size - the more the better though.	[reply] [d/l]
Re^3: Replacement for substr(Data::Dumper($x), 0, 4000) by flexvault (Monsignor) on Jan 17, 2012 at 16:16 UTC
mje, However, it is unlikely I can get an N which provides just over 4K... So pass Dumper 20KB or 100KB or whatever you think! It's still better than 16MB, and you control all the variables yourself! Thank you "Well done is better than well said." - Benjamin Franklin	[reply]
Re: Replacement for substr(Data::Dumper($x), 0, 4000) by educated_foo (Vicar) on Jan 17, 2012 at 15:13 UTC
Does `$Data::Dumper::Maxdepth` fix things for you? If not (i.e. if you want to truncate a depth-first printout), then you may have to roll your own. You also might be able to wrap `Data::Dumper::_dump()` to keep track of how much it has produced, then use `goto LABEL` or `die()` to jump out when you have enough output. But Data::Dumper is partially XS, so that could be dangerous.	[reply] [d/l] [select]
Re^2: Replacement for substr(Data::Dumper($x), 0, 4000) by Tux (Canon) on Jan 17, 2012 at 15:34 UTC
stealing `_dump` was my first hunch too, but is that is called on (sub)values, it will be very hard to catch the total size. And you will have to force using `Dumpperl` (a.o.t. using the much faster `Dumpxs`). `{ my $size = 0; my $org_dump = \&Data::Dumper::_dump; sub Data::Dumper::_dump { $size >= 4000 and return ""; my $s = $org_dump->(@_); $size += length $s; return $s; } }` [download] could be a crude start ... Enjoy, Have FUN! H.Merijn	[reply] [d/l] [select]
Re^2: Replacement for substr(Data::Dumper($x), 0, 4000) by mje (Curate) on Jan 17, 2012 at 15:38 UTC
Maxdepth does not really help as I want the initial 4K to be correct (with no omissions) as far as it goes. _dump sounds interesting and I will look at it.	[reply]
Re: Replacement for substr(Data::Dumper($x), 0, 4000) by tobyink (Canon) on Jan 17, 2012 at 13:50 UTC
I quite like Data::Printer, though I've not benchmarked it. By default it does a shallow dump, so doesn't tend to spew as much data as Data::Dumper does.	[reply]
Re^2: Replacement for substr(Data::Dumper($x), 0, 4000) by mje (Curate) on Jan 17, 2012 at 13:55 UTC
http://deps.cpantesters.org/?module=Data%3A%3APrinter;perl=latest massively puts me off Data::Printer - far too many dependencies for me. We are trying to keep the process small.	[reply]


Come for the quick hacks, stay for the epiphanies.
	PerlMonks