comment on

Hey, I'm in the process of writing my first bioinformatics script (see BioGeek's scratchpad).
(For those interested in the biological background of what I'm trying to accomplish:

For only very few genetic disorders the exact molecular cause is known. There are however a big number of inheritable diseases where researchers have been able to pinpoint -via various techniques- some stretch on one or more chromosomes where suspect to find the gene(s) responsible for the disease. Each of these stretches (called a locus) still contains several hundred genes, and testing each and every gene on it is near impossible because it would be much to time-consuming and expensive. So, several prioritization techniques have been invented to rank the candidate disease genes on each locus. A good prioritization technique would give as a result very low values for most of the genes on a locus, and high values for the few genes it thinks are involved in the disease pathway. A very strong indication that a gene is responsible for a disease would be when several independent prioritization techniques each give the gene a high score.

).

It consists of 4 parts:
1) Asks for user input.

#usr/bin/perl -T

use strict;
use warnings;
use diagnostics;
use LWP;
use DBI;
use Data::Dumper;

########################
# Let user make a choice
########################
TOP:
print "Typ the number of the disease you want the\n";
print "genes to be calculated and compared of.\n";
print "1\tAlzheimer disease (5, 6)\n";
print "2\tBardet Biedl (syndrome 1, 2, 3, 5)\n";
print "3\tBreast cancer (type 4)\n";
print "4\tdiabetes mellitus, noninsulin-dependent (3)\n";
print "5\tHypertension (essential, susceptibility to, 2)\n";
print "6\tInflammatory bowel disease (7)\n";
print "7\tSystemic lupus erythematosus (susceptibility to, 3)\n";
print "8\tmuscular dystrophy, limb-girdle (autosomal recessive) (type 
+1D, 1H)\n";
print "9\tparkinsons disease (type 3, 4, autosomal dominant, Lewy body
+)\n";
print "10\tprostate cancer (susceptibility to)\n";
[download]

2) depending on this input, a number of subroutines are fired up.

########################
# Define for which keywords to go looking in the different databases
########################
my $choice = <STDIN>;
chomp($choice);

if ($choice == 1) {
        #get_g2d('Alzheimer disease-5');
        #get_g2d('Alzheimer disease 6');
        get_pocus('alz');
        #get_dgp('alz 10q');
        #get_dgp('alz 12p11.23-q13.12');
}
elsif ($choice == 2) {
        #get_g2d('Bardet-Biedl syndrome 1');
        #get_g2d('Bardet-Biedl syndrome 2');
        #get_g2d('Bardet-Biedl syndrome 3');
        #get_g2d('Bardet-Biedl syndrome 5');
        get_pocus('bb');
        #get_dgp('bb 11q13');
        #get_dgp('bb 16q21');
        #get_dgp('bb 2q31');
        #get_dgp('bb 3p13-p12');
}
elsif ($choice == 3) {
        #get_g2d('Breast cancer, type 4');
        get_pocus('bc');
        #get_dgp('bc 13q21');
}
elsif ($choice == 4) {
        #get_g2d('Diabetes mellitus, noninsulin-dependent, 3');
        get_pocus('niddm');
        #get_dgp('niddm 20q12-q13');
}
elsif ($choice == 5) {
        #get_g2d('Hypertension, essential, susceptibility to, 2');
        get_pocus('h');
        #get_dgp('h 15q');
}
elsif ($choice == 6) {
        #get_g2d('Inflammatory bowel disease 7');
        get_pocus('ibd');
        #get_dgp('ibd 1p36');
}
elsif ($choice == 7) {
        #get_g2d('Systemic lupus erythematosus, susceptibility to, 3')
+;
        get_pocus('sle');
        #get_dgp('sle 4p16-15.2');
}
elsif ($choice == 8) {
        #get_g2d('Muscular dystrophy, limb-girdle, type 1D');
        #get_g2d('Muscular dystrophy, limb-girdle, type 2H');
        get_pocus('md');
        #get_dgp('md 7q');
        #get_dgp('md 9q31-q33');
}
elsif ($choice == 9) {
        #get_g2d('Parkinson disease, type 3');
        #get_g2d('Parkinson disease 4, autosomal dominant, Lewy body')
+;

        #get_pocus('pd');
        #get_dgp('pd 4p15');
}
elsif ($choice == 10) {
        #get_g2d('Prostate cancer, susceptibility to');
        get_pocus('pc');
        #get_dgp('pc 143000001-160000000');
}
else { print "Type a number between 1 and 10.\n"; 
        goto TOP
}
[download]

3) One of the subroutines opens a locally stored file (available here), parses it, and returns a list with EnsembleID's (an international standard for gene name's) and their associated score's.

####################
# open file with results from POCUS database, parse it,
# and retrieve gene-name and score.
####################

my @results=();
sub get_pocus {
        my @disease_name_pocus = @_;
        print "Getting the genes and rankings from the POCUS analysis 
+for:\n";
        print "$disease_name_pocus[0]\n";
        open (POCUS, "/home/jeroen/POCUS/results_100.out") || die "cou
+ldn't open the file: $!\n";
                while (<POCUS>) {
                        push @results, [ split ];
                }

                open (MARKER, ">marker.list.txt") || die $!;
                        for (my $i = 0; $i < scalar @results; $i++) {
                                if ($results[$i]->[0] eq $disease_name
+_pocus[0]) {
                                       if ($results[$i]->[1] ne $resul
+ts[($i+1)]->[1]){
                                        print MARKER "$results[$i]->[1
+]\t$results[$i]->[4]\n";
                                       }
                                }
                        }
}
close(MARKER);
[download]

4) This information is then used to query the Ensembl database, and retrieve for each gene the start and stop positions on the chromosone, and their strand (can be + or -).

##################
# Make connection with Ensembl, and retrieve start, end  and rank.
##################


my $datasource = "DBI:mysql:homo_sapiens_core_23_34e:ensembldb.ensembl
+.org";
my $dbh=();
my $sth=();  # database and statement handles
my @rows;         # to get results or queries
my $row_count;    # to see how much we get back
my $username = "anonymous";
my $password = "";
my $marker = 'marker_list.txt';

# connect to database:
$dbh = DBI->connect($datasource, $username, $password, {RaiseError => 
+1});

open (LIST, "<$marker") or die ("Couldn't open file $marker: $!.\n");

while (<LIST>){
    my ($ensemblID, $score) = $_;


    # select all sequence entries:

    $row_count = 0;
    $sth = $dbh->prepare (qq{
        SELECT gene_stable_id.stable_id, gene.seq_region_start, gene.s
+eq_region_end, gene.seq_region_strand
        FROM gene, gene_stable_id
        WHERE gene_stable_id.gene_id = gene.gene_id
        AND gene_stable_id.stable_id ="$ensemblID"
    });
    $sth->execute ();

    # read results:
    while (@rows = $sth->fetchrow_array () ) {
        $row_count++;
        print join ("\t", @rows), "\n";
    }

    unless ( $row_count ) {
        print "This gene doesn't appear to be placed on the current as
+sembly.\n";
    }

# tidy up:
close(LIST);
$sth->finish ();
$dbh->disconnect ();
}
[download]

The first three parts work as advertised, but the last part gives me an Can't call method "finish" on an undefined value at get_pocus.pl line 168 error. Any clues as to what I might be doing wrong?

Also, I'd like my final output to be in the format:
$chromosome\t$start\t$end\t$ensemblID\t$score\t$strand\n
chr22 1000 5000 ENSG00000141485 960 +
chr22 2000 6000 ENSG00000145848 200 -

As noted, this is my very first script, so code will probably be hairy. Any remarks on how to streamline the code, or tips on how to improve it are appreciated very much.
Update: moved finish() within the while loop (thanks gellyfish). This resolves the DBI error. The question about the output, and the request for constructive feedback still stand tough.
Update 2:Fixed a typo (s/marker.list.txt/marker_list.txt/).

In reply to Can't call on undefined value (DBI) + constructive feedback asked. by BioGeek

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.