bitvector has asked for the wisdom of the Perl Monks concerning the following question:

I'm writing a script to migrate data between two Postgres hosts, and in the interests of performance doing this with COPY seems the logical choice for moving the data. Lucky for me, DBD::Pg has several functions for doing just that.

Unfortunately, I'm having trouble getting pg_getcopydata() to function as advertised, and there doesn't seem to be a lot (ok, ANY) discussion about it here, or anywhere Google seems to think it should point me. The only information I seem to be able to get Google to show me beyond the DBD::Pg documentation are patches that mention pg_getcopydata(), a thread about a bugfix causing a segfault, and a thread about updating the documentation to say that pg_getcopydata() returns one row at a time. While I'll admit the syntax used in the example (1 while ... - pasted below) isn't something I'm the most familiar with, it doesn't seem to be something that should stop after only one iteration when there are many rows in the table. Please correct me if I am amiss in that assumption.

The code I'm using is essentially copied right from the documentation for DBD::Pg, since it should suffice for my needs with only minor modification:

$dbh->do(qq{COPY "$schema"."$table" TO STDOUT}); my @data; my $record_count = 0; 1 while $dbh->pg_getcopydata($data[$record_count++]) >= 0;

I'm sure the use of quoting in the query itself is going to offend someone's sensibilities - indeed it makes my spider sense tingle as well, but in our environment (many relations with camel case and spaces in the relation names), that seemed the best way to handle our tables. If there's a way to get DBI to quote things differently when binding parameters (single quotes don't work when referencing relation names in Postgres) I wouldn't mind hearing about that as well, but I'm much more interested in a solution to my main problem.

When I run my script, I see the COPY running in the logs on the source for the data, and it dutifully seems to serve up the data (tested by copy/paste of the query as run into psql with the intended results returned). The problem is, pg_getcopydata() doesn't seem to be doing...well...anything. In the postgres logs on the receiving end of the COPY, I see the COPY statement, but no rows of data passed along with it. When I print @data via Data::Dumper after the pg_getcopydata() line, there isn't any output at all (ie- there isn't anything in @data).

Oddly, if I pre-populate @data with a dummy record, the behavior changes. In that case, pg_getcopydata() gets precisely ONE row into @data, and my dummy record in there is clobbered. When I print Dumper @data both before and after the pg_getcopydata() line, I see my dummy record before, and a row of actual data (but not my dummy record) afterwards. I know there are more rows of data in the tables I'm testing this with, so if it IS functioning properly, there should be more than just the one row added to @data.

Can anyone suggest where I (or the documentation for pg_getcopydata()) has failed at this? If the syntax provided in the documentation is SUPPOSED to just return one record, why isn't it returning a record unless @data already has a record in it? If it should be looping through every record in the table and adding them all to @data, why might it not be doing that, or not even attempting to do so unless @data already has a record in it?

Replies are listed 'Best First'.
Re: pg_getcopydata failing to get copy data using documented syntax
by NetWallah (Canon) on Dec 28, 2011 at 18:22 UTC
    The documentation says (emphasis mine):
    The first argument to pg_getcopydata is the variable into which the data will be stored (this variable should not be undefined, or it may throw a warning, although it may be a reference)
    It works fine if you change the line to a REFERENCE to @data, like:
    1 while $dbh->pg_getcopydata(\$data[$record_count++]) >= 0;
    You get a $record_count one higher than necessary.

    FYI - you do not need to quote "$schema" or "$table", since you are in a qq().

                "Battle not with trolls, lest ye become a troll; and if you gaze into the Internet, the Internet gazes also into you."
            -Friedrich Nietzsche: A Dynamic Translation

      Ah ha! Using a reference got it going, thanks! Do you know offhand why it gets a higher record count than necessary? I know I can just pop off the blank element, but if there's a more correct way to do it, I'd rather do that.

      The quoting around $schema and $table is because our relations are named things like 'Sales Rep', which will break if not double quoted, because Postgres will convert the relation name to lower case and/or throw a syntax error on the space. They're (the double quotes) meant to be passed in that context as literal double quotes, rather than to cause interpolation of the values in $schema and $table.

        $dbh->pg_getcopydata(\$data[$record_count++])
        This is a post-increment.

        When the last data record is retrieved, pg_getcopydata returns a Non-zero value, so, it is called one more time, and this time it returns NO data, but the return code is negative.

        So, that last call with no data increments the $record_count unnecessarily.

        Any attempt to repair this in-line loses the simplicity of the current code, so the best bet is to add a line "$record_count --", and add a comment as to why you are doing that.

        Re- Quoting the $schema, $table: Ouch! spaces in these names is just poor DB design. I agree with you - adding the quotes is good practice, just for these cases (which I had not thought of).

                    "Battle not with trolls, lest ye become a troll; and if you gaze into the Internet, the Internet gazes also into you."
                -Friedrich Nietzsche: A Dynamic Translation