Re: optimize this code?
by dws (Chancellor) on Feb 08, 2002 at 22:45 UTC
|
I'm wondering if anyone can optimize these loops for me: each cycle takes about a second, ...
You need to separate out the time it's taking to open the file and do file I/O from the time it's taking to interate over the keys and build the matrix.
Do this: Comment out everything between the tie and the untie (missing, but worth adding for general cleanliness). If this doesn't have a significant impact on runtime, then your answer is "buy a faster disk" (or defrag the one you've got).
Next step is to comment out the code that's building the matrix (after doing each %hash). This retains the disk IO for reading keys, while separating out the overhead for your data structures. If this doesn't have a significant effect of overall runtime, answers range from "buy a faster disk" to "precache the keys in the db".
I really doubt this is due to your matrix building code.
| [reply] [d/l] [select] |
|
|
Hi dws-- I commented out individual snippets of this code. The slowest thing by far is the matrix building line, $matrix$j->$i = $hash{$key}; Also somewhat slow is the looping through the hash. The loading of the data is completely quick. So: does anyone know a better way to load values more quickly into a large matrix?
| [reply] |
|
|
The slowest thing by far is the matrix building line
That surprises me. Since that's the case, you might consider an alternate matrix representation. Is it important that the column values be in order by key? Will there always be the same number of {key, value} pairs in each of the .db's?
If each .db has the same number of {key, value} pairs, you can build the array in a single large nFile x nKeys vector, which might let you can some creation time at the expense of later access time.
| [reply] |
|
|
|
|
my @array;
my $t = time();
for my $i ( 0 .. 499 ) {
for my $j ( 0 .. 6264 ) {
$array[$i]->[$j] = 47;
}
}
print "Elapsed: ", time() - $t, " seconds\n";
ran in 17 seconds on my 400Mhz laptop, so it has to be the disk I/O that's killing you.
If you're unable to use a different DBM representation (such as DB_TREE), then you might try pulling keys and values out of the DBM in whatever order it prefers to give them to you, and then sort them by key in memory. By pulling out the keys and sorting before going after the values, you might be trashing around a bit in the file.
| [reply] [d/l] [select] |
|
|
Re: optimize this code?
by japhy (Canon) on Feb 08, 2002 at 22:43 UTC
|
Here's a shot in the dark.
my @matrix;
while (my $file = glob "*.db") {
tie my %hash, "DB_File", $file, O_RDONLY, 0644
or warn("Could not read $file: $!") and next;
# presize the array the first time
@matrix = map [], 1 .. keys %hash
unless @matrix;
my $i = 0;
for (sort keys %hash) {
push @{ $matrix[$i++] }, $hash{$_};
}
}
_____________________________________________________
Jeff[japhy]Pinyan:
Perl,
regex,
and perl
hacker.
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??; | [reply] [d/l] |
Re: optimize this code?
by shotgunefx (Parson) on Feb 08, 2002 at 22:34 UTC
|
Off the top of my head, You can specify the sort for the DB_File (using DB_BTREE), that would elimate the sort (keys (%hash)).
-Lee
"To be civilized is to deny one's nature." | [reply] |
|
|
++ Right on! Not only would it eliminate the sort, the resulting data structure would be no glorified hash (tilly don't consider a balanced tree one, I'm agreeing after a few /tell's), but a fulll fledged sorted balanced binary tree (extra efficiency in lookups).
And now, my favorite activity besides programming, quoting from the pod:
DB_BTREE
The btree format allows arbitrary key/value pairs to be stored in a sorted, balanced binary tree.
As with the DB_HASH format, it is possible to provide a user defined Perl routine to perform the comparison of keys. By default, though, the keys are stored in lexical order.
update
as for building the matrix, see what japhy said (it's got to do with how perl grows the array).
The BTree will definetly help, but I'm not sure how noticably ... you might wanna look at the DB_Fille pod for an alternate method of accessing the key/value pairs (like the seq method), or use the BerkleyDB module.
Question: Evanovich, how are you benchmarking this?
Suggestion: If you, with the help of the monks, don't figure out how to optimize this sufficiently, you might wanna consider a C/C++ solution. BerkleyDB has a C/C++ interface, and your code isn't too complex/long ... please don't hurt me ;)(/duck)
| [reply] [d/l] |
|
|
Well, what I don't understand Crazyinsomniac, is why this should take so long, to build the matrix I mean. Accessing the data is lightning fast, it's just that looping through the hash that is rough. Question: if I made my data into single entry vector tables, and then looped through the vector and loaded it into the matrix, maybe that would help? I don't want to go to C, because this code is just the beginning of much more complicated stuff, and I need perl's structures I think.
| [reply] |
|
|
I commented out the sort, and it doesn't really affect the loop time. Will this help the time of building the matrix, which seems to be taking up the bulk of each loop?
| [reply] |
Re: optimize this code?
by hossman (Prior) on Feb 08, 2002 at 22:46 UTC
|
Optimizing ends to require that the optimizer knows
what the goal is, otherwise you runthe risk of
optimizing away desired side effects.
| [reply] |