in reply to Re^3: Do you really want to use an array there?
in thread Do you really want to use an array there?
..since with the vec function i can decode 3000000 doc ids in 2 seconds and 10 milion in 6 secs!!!) as the below code shows..
First off, using vec to pack 32-bit (Ie. byte, word and dword aligned) numbers is giving you a false impression of it's performance. It's when you start crossing those boundaries that the performance falls off sharply. If you wanted to just pack 32-bits on dword boundaries, pack 'V' (or N if your on a bigendian machine) is faster:
C:\test>p1 cmpthese -3, { pack_32bit => q[ my $packed = pack 'V*', 1 .. 1e6 ], vec_32bit => q[ my $packed = ''; vec( $packed, $_, 32 ) = $_ for +1 .. 1e6 ] };; s/iter vec_32bit pack_32bit vec_32bit 5.58 -- -12% pack_32bit 4.94 13% --
But neither gives you the compression you seek.
About your code for the Elias technique i have to say that it is 3 times faster than mine
But it is still slower than my $packed = pack 'w*', @numbers; and achieves far less compression. pack 'w', (BER) compression is built in, gives the best compression and speed.
For the SQL command that you propose i want to ask you for which server is appropriate because on MySQL there is no command for the intersection( i tried some inner join but the perfomance was very very very slow for 1GB dataset (250000 pages,Average document length : 602 words Number of unique words: 907806)...
I gave up on MySQL a long time ago because of it's limitations. I has improved markedly in recent versions with allowing subselects places where it never used to, and the addition of stored procedures and stuff but I still prefer Postgres. In particular, the pgAdmin III tool is excellent for tuning your queries.
I'll try building a DB to match those numbers and let you know how the performance pans out, but even if it was 50 times slower than with (5000/554/15000), which it won't be, it will still be 100 times faster than having to decompress 25 times as much data as you need, then select the 4% you do in Perl. I'll let you know.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^5: Do you really want to use an array there?
by MimisIVI (Acolyte) on Apr 14, 2008 at 16:57 UTC | |
by BrowserUk (Patriarch) on Apr 14, 2008 at 17:36 UTC | |
by MimisIVI (Acolyte) on Apr 14, 2008 at 17:51 UTC | |
by BrowserUk (Patriarch) on Apr 14, 2008 at 18:25 UTC | |
by MimisIVI (Acolyte) on Apr 14, 2008 at 18:37 UTC | |
|