mocnii has asked for the wisdom of the Perl Monks concerning the following question:

I found a bug in the following code from ClickHouse (it breaks on large SELECTs). It reads from Net::HTTP read_entity_body():
sub _read_body { my ($self) = @_; my @response; while (1) { my $buf; my $n = $self->_get_socket()->read_entity_body($buf, 1024); die "can't read response: $!" unless defined $n; last unless $n; push @response, split (/\n/, $buf); } return \@response; }
and replaced it with
sub _read_body { my ($self) = @_; my @response; + my $content = ''; while (1) { my $buf; my $n = $self->_get_socket()->read_entity_body($buf, 1024); die "can't read response: $!" unless defined $n; last unless $n; - push @response, split (/\n/, $buf); + $content .= $buf; } + push @response, split (/\n/, $content); + return \@response; }
My pull request is here: https://github.com/elcamlost/perl-DBD-ClickHouse/pull/4, but is not accepted (nor fixed). This fix works but the problem is that now everything is doubled in memory (and this is used to select from database so it can be huge). I'm looking for a suggestion how to make it less memory hungry or simply cleaner. I tried to use other module (HTTP-ClickHouse), but it has the same bug so this is obviously something that is easy to get wrong.

Replies are listed 'Best First'.
Re: How to split rows coming from the database
by Corion (Patriarch) on Nov 08, 2016 at 13:55 UTC

    If you want to keep $content as small as possible, you need to do both, splitting and keeping the remainder:

    ... my $remainder = ''; while (1) { my $buf; my $n = $self->_get_socket()->read_entity_body($buf, 1024); $buf = $remainder . $buf; $remainder = ''; die "can't read response: $!" unless defined $n; last unless $n; if( $buf =~ s!([^\n]+\z)!! ) { $remainder = $1; }; push @response, split (/\n/, $buf); }

    You need to keep the incomplete line around until the next newline is read.

        First of all, thanks for correlating this bug and for reporting a fix!

        I'm not really sure your fix is complete, because I think it will lose the contents of $remainder if read_entity_body returns an empty string but $remainder still has content. It should at least warn or signal an error.