perl and psql

gogoglou has asked for the wisdom of the Perl Monks concerning the following question:

Hallo, I have the following code that actually does as search in my psql database and clusters some results. Is there any possible way, instead of getting these results printed out, to make my script create a table in the database and store them there ? thank you very much for any ideas


use strict;
use Data::Dumper;
use DBI;

# Connect to the database
# See footnote 1
my $dbh = DBI->connect('DBI:Pg:dbname=smth', 'user', 'pass')
    or die "Couldn't open database: $DBI::errstr; stopped";


my $sql = " SELECT sequence_start,sequence_stop from intergenic_and_ex
+perimental_results WHERE reading_direction=? order by sequence_start"
+;
my $sth = $dbh->prepare($sql);

my @clusters = ();

$sth->execute("f");

my $regionStart = -1;
my $regionStop = -1;

while (my ($dnaStart, $dnaStop) = $sth->fetchrow_array() ) {

    if (($dnaStart >= $regionStart && $dnaStart <= $regionStop) ||
        ($dnaStop >= $regionStart && $dnaStop <= $regionStop)) {

        if ($dnaStart < $regionStart) {    
            $regionStart = $dnaStart;
        }
        if ($dnaStop > $regionStop) {
            $regionStop = $dnaStop;
        }

    } else {
        if ($regionStart != -1) {
            push(@clusters,[$regionStart,$regionStop,"f"]);
        }
        $regionStart = $dnaStart;
        $regionStop = $dnaStop;
                
    }

}

push(@clusters,[$regionStart,$regionStop,"f"]);




$sth->execute("r");

$regionStart = -1;
$regionStop = -1;

while (my ($dnaStop, $dnaStart) = $sth->fetchrow_array() ) {
    if (($dnaStart >= $regionStart && $dnaStart <= $regionStop) ||
        ($dnaStop >= $regionStart && $dnaStop <= $regionStop)) {
    
        if ($dnaStart < $regionStart) {    
            $regionStart = $dnaStart;
        }
        if ($dnaStop < $regionStop) {
            $regionStop = $dnaStop;
        }


    } else {
        if ($regionStart != -1) {
            push(@clusters,[$regionStart,$regionStop,"r"]);
        }
        $regionStart = $dnaStart;
        $regionStop = $dnaStop;
                
    }
}

push(@clusters,[$regionStart,$regionStop,"r"]);

sort {$a->[0] <=> $b->[0]} @clusters;



foreach my $cluster (@clusters) {
    print join("\t",@{$cluster})."\n";
}

print int(@clusters)." cluster gesamt\n";
# Disconnect from the database
$dbh->disconnect();
[download]

Comment on perl and psql Download Code

Replies are listed 'Best First'.
Re: perl and psql by Xilman (Hermit) on Dec 02, 2009 at 13:03 UTC
Yes you can and in a very simple manner. I'll sketch the outline and leave you to fill in the details as it's a very useful learning exercise. Instead of printing, call a SQL command to insert them into a table. That is, use something like something like: `$dbh->foreach my $cluster (@clusters) { do ("INSERT INTO $table (start, end) VALUES (cluster[0], cluster[1]);" +); } # Ensure that the insertion is committed (or set auto-commit when fir +st connecting to the DB). $dbh->commit();` [download] This presumes you have created a psql table, the name of which is held in $table, with columns called 'start' and 'end'. Creating that table is left as an easy exercise. Paul	[reply] [d/l]
Re^2: perl and psql by almut (Canon) on Dec 02, 2009 at 14:18 UTC
`$dbh->do ("INSERT INTO $table (start, end) VALUES (cluster[0], cluster[1]);");` In general, it's a good idea to use placeholders (like the OP already did in the SELECT statement), because that way you avoid potential quoting issues, and don't run the risk of SQL injections... `foreach my $cluster (@clusters) { my $sth = $dbh->prepare("INSERT INTO $table (start, end) VALUES (? +, ?)") or die $dbh->errstr; $sth->execute($cluster->[0], $cluster->[1]) or die $sth->errstr;; }` [download] (assuming the interpolated variable `$table` is "safe", i.e. initialised from a static name stored program-internally...)	[reply] [d/l] [select]
Re^2: perl and psql by gogoglou (Beadle) on Dec 02, 2009 at 14:14 UTC
thanks for your reply. I am really new in perl soany help is usefull. I get a small error message though. first I create at he top of the script my $table which has a start and an end position, but then when I replace the print with the example you gave me I get an error that says that my clusters requires explicit package name. I don't really understand why this happens in this case, since the clusters have been declared. Any ideas would be really usefull. Thanks	[reply]
Re^3: perl and psql by bv (Friar) on Dec 02, 2009 at 14:29 UTC
There was a bug in the code that was posted as a reply. I don't think Xilman intended that code to be blindly copy-and-pasted, but here's what's probably going on: # This should not have the '$dbh->' at the beginning, but the real iss +ue is # $cluster here is a scalar, but foreach my $cluster (@clusters) { # here, they are being treated as arrays (if they were interpolated as + written. # $dbh->do ("INSERT INTO $table (start, end) VALUES (cluster[0], clust +er[1]);"); # a better approach, if your DBI module supports placeholders: $dbh->do ("INSERT INTO $table (start, end) VALUES (?,?);", @{$cluster} +) # or even: $dbh->do ("INSERT INTO $table (start, end) VALUES (?,?);", +$cluster->[0], $cluster->[1]) } [download] Again, this is untested code, and may not work as advertised. If the `->` and `@{}` are confusing you, check out perlreftut for more info on references. `@_=qw; Just another Perl hacker,; ;$_=q=print "@_"= and eval;` [download]	[reply] [d/l] [select]
Re^4: perl and psql by almut (Canon) on Dec 02, 2009 at 15:59 UTC
Re^4: perl and psql by Xilman (Hermit) on Dec 02, 2009 at 17:11 UTC
Re: perl and psql by erix (Prior) on Dec 02, 2009 at 18:29 UTC
<bold>update</bold> I didn't read your code too well, so disregard the below. Perl is nice, but you don't always need it: there is also the direct SQL possibility (syntax: create table new_table as myquery): `-- drop table if exists new_table; create table new_table as ( select sequence_start,sequence_stop from intergenic_and_experimental_results where reading_direction = 'f' ) union all ( select sequence_start,sequence_stop from intergenic_and_experimental_results where reading_direction = 'r' ) order by reading_direction, sequence_start ;` [download] I always run a manual analyze on newly created table, just to make sure statistics (on which the planner relies) are up to date: 'analyze mytable;' postgres manual: CREATE TABLE AS (Or, in case you want to insert into an existing table: INSERT INTO (syntax: insert into mytable myquery) (Or, in case you want to create a tab-separated file: COPY TO (syntax: copy ( myquery ) to 'file.txt' csv delimiter E'\t') (and just btw: postgres is designated 'pgsql'; psql being the cli client)	[reply] [d/l]