Thanks for responding so fast!
This is used in a rather *large* Perl program. Basically, I have thousands of arrays (currently ~3500) that could potentially contain millions of bit vectors each. Each bit vector size will be equal, likely 50 bits (at minimum) or 200 bits (at largest). I'll use the smallest in this example:
# Create the bit vectors
my $bv1 = Bit::Vector->new(50);
my $bv2 = Bit::Vector->new(50);
...
# Keep associated bv's together
my @source1 :shared = ($bv1a,$bv2a,...,$bv85000a,...$bv25000000a);
my @source2 :shared = ($bv1b,$bv2b,...,$bv85000b);
...
# Add these arrays to a single, global array
my @sources :shared = (\@source1, \@source2, ...);
These may not all fit into memory, so I would have to use Storable's store() and retrieve() to dump them to and pull them from disk when need. If they do fit into memory, I would like one shared array or hash that holds references to all these shared arrays of bit vectors, because I will be performing pairwise set intersections on all of them (yep, many-to-many). So I would want one copy of this hash/array in memory shared amongst all threads so each could pick and choose what they need and when to do their subset of intersections.
The RESULTS of these intersections would be bit vectors as well and after a thread has finished computing the intersections on its bunch of bit vectors, I would like to add these bit vectors to a different shared "results" hash of bit vectors that organizes the results of these source|target set comparisons, and move on from there.
# One pairwise intersection:
# $intrx1_2 is a bit vector whose bits correspond to the positions
# in @source1 whose bit vectors intersected with those in @source2;
my ($intrx1_2, $intrx2_1) = set_intersection(\@source1,\@source2);
# Storing the results in a globally shared hash
$globally_shared_hash{$source1}->{$target2} = $intrx1_2;
$globally_shared_hash{$source2}->{$target1} = $intrx2_1;
Downstream subs() will additionally process these bit vectors, so to prevent re-loading a thread's subset of bit vectors to process from disk, I'd rather them just read a shared memory space to avoid the hit of Storable's retrieve().
The resulting globally shared hash would look something like:
%globally_shared_hash = (
$source1 => { $target2 => $bv1_2,
$target3 => $bv1_3,
$target4 => $bv1_4,
...
},
$source2 => { $target1 => $bv2_1,
$target3 => $bv2_3,
$target4 => $bv2_4,
...
},
...
);
Currently, I used the following methods from the Bit::Vector package:
Bit::Vector->new() # bv constructor
$vec->to_Hex() # bv -> HEX string
$vec->to_Bin() # bv -> BINARY string
$vec->Clone() # new vector, exact duplicate
$vec->Size() # gets length of bv
$vec->Reverse() # reverses bv
$vec->bit_test($index) # 0 or 1
$vec->bit_flip($index) # flips bv's bit at $index
$vec->Bit_On($index) # turn bit on
$vec->Bit_Off($index) # turn bit off
$vec->Interval_Scan_dec # grabs (min,max) of next chunk of 0's
$vec->Lexicompare($vec2) # +1,0, or -1
Think your XS could alleviate my problem or do you think I just need to be more creative with how I manage thread-local data?
|