in reply to Re: Thread sharing a bit vector??
in thread Thread sharing a bit vector??

Thanks for responding so fast!

This is used in a rather *large* Perl program. Basically, I have thousands of arrays (currently ~3500) that could potentially contain millions of bit vectors each. Each bit vector size will be equal, likely 50 bits (at minimum) or 200 bits (at largest). I'll use the smallest in this example:

# Create the bit vectors my $bv1 = Bit::Vector->new(50); my $bv2 = Bit::Vector->new(50); ... # Keep associated bv's together my @source1 :shared = ($bv1a,$bv2a,...,$bv85000a,...$bv25000000a); my @source2 :shared = ($bv1b,$bv2b,...,$bv85000b); ... # Add these arrays to a single, global array my @sources :shared = (\@source1, \@source2, ...);

These may not all fit into memory, so I would have to use Storable's store() and retrieve() to dump them to and pull them from disk when need. If they do fit into memory, I would like one shared array or hash that holds references to all these shared arrays of bit vectors, because I will be performing pairwise set intersections on all of them (yep, many-to-many). So I would want one copy of this hash/array in memory shared amongst all threads so each could pick and choose what they need and when to do their subset of intersections.

The RESULTS of these intersections would be bit vectors as well and after a thread has finished computing the intersections on its bunch of bit vectors, I would like to add these bit vectors to a different shared "results" hash of bit vectors that organizes the results of these source|target set comparisons, and move on from there.

# One pairwise intersection: # $intrx1_2 is a bit vector whose bits correspond to the positions # in @source1 whose bit vectors intersected with those in @source2; my ($intrx1_2, $intrx2_1) = set_intersection(\@source1,\@source2); # Storing the results in a globally shared hash $globally_shared_hash{$source1}->{$target2} = $intrx1_2; $globally_shared_hash{$source2}->{$target1} = $intrx2_1;

Downstream subs() will additionally process these bit vectors, so to prevent re-loading a thread's subset of bit vectors to process from disk, I'd rather them just read a shared memory space to avoid the hit of Storable's retrieve().

The resulting globally shared hash would look something like:

%globally_shared_hash = ( $source1 => { $target2 => $bv1_2, $target3 => $bv1_3, $target4 => $bv1_4, ... }, $source2 => { $target1 => $bv2_1, $target3 => $bv2_3, $target4 => $bv2_4, ... }, ... );

Currently, I used the following methods from the Bit::Vector package:

Bit::Vector->new() # bv constructor $vec->to_Hex() # bv -> HEX string $vec->to_Bin() # bv -> BINARY string $vec->Clone() # new vector, exact duplicate $vec->Size() # gets length of bv $vec->Reverse() # reverses bv $vec->bit_test($index) # 0 or 1 $vec->bit_flip($index) # flips bv's bit at $index $vec->Bit_On($index) # turn bit on $vec->Bit_Off($index) # turn bit off $vec->Interval_Scan_dec # grabs (min,max) of next chunk of 0's $vec->Lexicompare($vec2) # +1,0, or -1

Think your XS could alleviate my problem or do you think I just need to be more creative with how I manage thread-local data?

Replies are listed 'Best First'.
Re^3: Thread sharing a bit vector??
by BrowserUk (Patriarch) on Apr 11, 2012 at 23:34 UTC
    Think your XS could alleviate my problem

    As it currently exists, it does not support the full range of operations you require.

    It could be extended to do so, but it would require some considerable effort to do so in a portable manner as it currently relies on Windows-specific memory management and MS Compiler intrinsic semantics.

    or do you think I just need to be more creative with how I manage thread-local data?

    I'm sorry to say that I don't believe that threads::shared is currently capable of doing what you need it to do. Because of that modules internal implementation, sharing large volumes of data across threads is not currently a viable option.

    The only viable solution to your description that I am aware of at this time would be to use a PostgreSQL DB and its BitString types & operators.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?