comment on

Hmmm - hardly an expert - more like someone ignorant enough to ask if what he thinks he's understood is in fact correct. The kind patience of those who've helped tells me they've understood it like that, as was intended. And my one way of giving back is to summarize lessons learned for others' future reference.

As for me, I don't believe your benchmarking for a simple reason: databases are definitely better than Perl at crunching tabular data, provided that you use them correctly.

Darn - and I had so enjoyed the unanimity of the "yes, that's what you should expect" crowd so far.

Anyway - no matter. And no problem with the challenge. I've been using vanilla test data anyway. Here it is:

DB 0

    $num_respondents = 20000;
    $offset = 0;
    for $respondent (1 .. $num_respondents) {
        $sql = "INSERT INTO theTable VALUES ($respondent";
        for $response (1 .. $offset) {
            $sql = $sql.", 0";
        }
        $response = $offset + 1;
        while ($response < $num_responses+1) {
            $sql = $sql.", 1";
            if (($num_responses - $response) > 4) {
                $sql = $sql.", 0, 0, 0, 0";
            }
            else {
                for $response ($response+1 .. $num_responses) {
                    $sql = $sql.", 0";
                }
            }
            $response = $response + 5;
        }
        if ($respondent % 2) {$sql = $sql.", 1, 0";}
        else {$sql = $sql.", 0, 1";}
        $sql = $sql.")";
        $sth = $dbh->prepare($sql) or die("Could not prepare!" . $dbh-
+>errstr);
        print "<p>$sql\n";
        $sth->execute() or die("Could not execute!" . $dbh->errstr);
        if ($offset < 4) {$offset++;}
        else {$offset = 0;}
    }
[download]

This produces rows that look like:

respondent_1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0...
respondent_2, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0...
respondent_3, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0...
[download]

so that each respondent gave every fifth response and each response was given by every fifth respondent, except for the last two which represent gender - each of those was given by 1/2 the respondents and every benchmark query uses one of the gender columns.

DB 1 & DB 2

    $sql = "INSERT INTO theTable (response, respondent) VALUES (?,?)";
    $sth = $dbh->prepare($sql) or die("Could not prepare!" . $dbh->err
+str);
    $i = 1;
    for $respondent (1 .. 20000) {
        for $r (0 .. 600) {
            $response = $i + ($r * 5);
            $sth->execute($response, $respondent) or die("Could not ex
+ecute!" . $dbh->errstr);
        }
        if ($respondent % 2) {
            $response = 9999;
            $sth->execute($response, $respondent) or die("Could not ex
+ecute!" . $dbh->errstr);
        }
        else {
            $response = 6666;
            $sth->execute($response, $respondent) or die("Could not ex
+ecute!" . $dbh->errstr);
        }
        if ($i < 5) {$i++;}
        else {$i = 1;}
    }
[download]

This produces:

1, 1
6,1
11, 1
...
2, 2
7, 2
12, 2
...
3, 3
8, 3
13, 3
...
[download]

so that, again, each respondent gave every fifth response and each response was given by every fifth respondent, except for the gender codes which are half-half.

DB 3

    $respondents_by_5 = 4000;
    $sql = "SHOW TABLES";
    $sth = $dbh->prepare($sql) or die("Could not prepare!" . $dbh->err
+str);
    $sth->execute() or die("Could not execute!" . $dbh->errstr);
    $j = 1;
    while ($table = $sth->fetchrow_array()) {
        if ($table ne 't6000' && $table ne 't9000') {
            $sql = "INSERT INTO ".$table." VALUES (?)";
            $std = $dbh->prepare($sql) or die("Could not prepare! At t
+able = $table because " . $dbh->errstr);
            for $i (0 .. $respondents_by_5) {
                $respondent = $j + ($i * 5);
                $std->execute($respondent) or die("Could not execute!"
+ . $dbh->errstr);
            }
            if ($j < 5) {$j++;}
            else {$j = 1;}
        }
    }
    $sql = "INSERT INTO t9999 VALUES (?)";
    $stf = $dbh->prepare($sql) or die("Could not prepare! At table = $
+table because " . $dbh->errstr);
    $sql = "INSERT INTO t6666 VALUES (?)";
    $stm = $dbh->prepare($sql) or die("Could not prepare! At table = $
+table because " . $dbh->errstr);
    for $respondent (1 .. 20000) {
        if ($respondent % 2) {
            $stm->execute($respondent) or die("Could not execute!" . $
+dbh->errstr);
        }
        else {
            $stf->execute($respondent) or die("Could not execute!" . $
+dbh->errstr);
        }
    }
[download]

This produces tables that look like:

Table_1:1, 6, 11, 16, ..., 19996
Table_2:2, 7, 12, 17, ..., 19997
Table_3:3, 8, 13, 18, ..., 19998
Table_4:4, 9, 14, 19, ..., 19999
Table_5:5, 10, 15, 20, .., 20000
Table_6:1, 6, 11, 16, ..., 19996
etc.
[download]

one row per table entry.

DB 4

    $j = 1;
    for $r (1 .. $num_responses) {
        @lines = ();
        $table = ">".$db_dir."/r".$r;
        for $i (0 .. $respondents_by_5) {
            $respondent = $j + ($i * 5);
            push (@lines, $respondent);
        }
        &write_lines_to_file(1, $table, @lines);
        if ($j < 5) {$j++;}
        else {$j = 1;}
    }
    for $respondent (1 .. 20000) {
        if ($respondent % 2) {
            push(@lines_6666, $respondent);
        }
        else {
            push(@lines_9999, $respondent);
        }
    }
    $table = ">".$db_dir."/r6666";
    &write_lines_to_file(1, $table, @lines_6666);
    $table = ">".$db_dir."/r9999";
    &write_lines_to_file(1, $table, @lines_9999);
[download]

where "write_lines_to_file" is

sub write_lines_to_file {
    my $add_line_return = shift;
    my $file = shift;
    my @lines=@_;
    open(FILE, ">$file") or dienice("cannot open file : $!");
    $STDOUT_Handle = select(FILE); 
    for my $i (0 .. $#lines) {
        print "$lines[$i]";
        if ($add_line_return) {print "\n";}
    }
       close(FILE);
    select($STDOUT_Handle);
}
[download]

This produces the same tables as DB3, but each in its own text file.

If you can see something that alters my understanding, I'm all ears.

In reply to Re: Basic Perl trumps DBI? Or my poor DB design? by punch_card_don
in thread Basic Perl trumps DBI? Or my poor DB design? by punch_card_don

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.