comment on

I'm binning data into segments and then want to retrive a subset of the data from a run of sequential bins. Data is going into the system fine, but not every bin has multiple values so when I iterate through the results after setting the cursor using get_dup, only keys with multiple values are returned by the cursor.

Below is a code which illustrates the problem. Is this a DB_File/BDB 1.x limitation/feature that is better addressed with BerkeleyDB interface and the more robust cursors?

$DB_BTREE->{'flags'} = R_DUP;
$DB_BTREE->{'compare'} = \&_compare;
my %btree;
my $bhandle = tie %btree, 'DB_File', undef, O_RDWR|O_CREAT, 0640, $DB_
+HASH;

my $len = 26;
my @array =  ( 'a'..'z' );
foreach ( 1..$len ) {
    $btree{$_} = shift @array;
}
# add a second value to each so that each key has duplicate values
@array =  ( 'A'..'Z' );
foreach ( 1..$len ) {
    $btree{$_} = shift @array;
}

# test to see that each value is printed from 20 - end
my @v = $bhandle->get_dup(20);
print "v is @v 20\n";

while( $bhandle->seq($k,$v, R_NEXT) == 0 ) {
    my @v = $bhandle->get_dup($k);
    print "$k @v\n";
}

# now associate a single value with a key

$btree{22.5} = 'HHI';

# test to see that each value is printed from 20 - end
my @v = $bhandle->get_dup(20);
print "v is @v 20\n";

while( $bhandle->seq($k,$v, R_NEXT) == 0 ) {
    my @v = $bhandle->get_dup($k);
    print "$k @v\n";
}
# 22.5 does not show up

# add a second value for 22.5
$btree{22.5} = 'JKL';

# test to see that each value is printed from 20 - end
my @v = $bhandle->get_dup(20);
print "v is @v 20\n";

while( $bhandle->seq($k,$v, R_NEXT) == 0 ) {
    my @v = $bhandle->get_dup($k);
    print "$k @v\n";
}

# now 22.5 is in the list
[download]

The best workaround I have thought of will be to dump all the keys, find the bin that is closest to where I want to start O(log(n) (since list will be sorted), and walk through the list until reaching end boundary condition, calling get_dup on each key in the subset (which still works if only one value is stored for the key).

In reply to In order traversal of BTREE keys where not all keys have duplicate values (using DB_File) by stajich

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.