need a faster median

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: need a faster median by zentara (Cardinal) on Oct 18, 2011 at 17:39 UTC
PDL is about as fast as C for number crunching, but then again, it is a big module to load. If you want the fastest speed possible, you should probably take the quickselect.c program discussed on fastest median possible and put it into an Inline::C routine. I'm not really a human, but I play one on earth. Old Perl Programmer Haiku ................... flash japh	[reply]
Re: need a faster median by salva (Canon) on Oct 18, 2011 at 20:23 UTC
use Sort::Key::Top: `use Sort::Key::Top 'natpos'; use Benchmark qw(cmpthese); my @f = map rand, 0..1e7; my ($r1, $r2); cmpthese -1, { sort => sub { $r1 = (sort { $a <=> $b } @f)[@f/2] }, skt => sub { $r2 = natpos @f/2, @f } }; print "median $r1, $r2\n";` [download] on my computer: `$ perl /tmp/median.pl s/iter sort skt sort 20.5 -- -94% skt 1.25 1542% -- mediam 0.499785519518444, 0.499785519518444` [download]	[reply] [d/l] [select]
Re: need a faster median by Corion (Patriarch) on Oct 18, 2011 at 16:32 UTC
I would assume that the "best" way to go really depends on what you really need, and by how much the calculated median is allowed to deviate from the real median. You do realize that to find the real median, you always have to look at at least 50% of all elements in your set unless you do some preparation. If you can pre-sort your elements or use a heap, or a good estimate where your median will likely land, you might get away with throwing out more elements. But all of these approaches have drawbacks and you haven't stated what drawbacks are acceptable and which ones are not.	[reply]
Re^2: need a faster median by Anonymous Monk on Oct 18, 2011 at 16:46 UTC
I have a non-sorted array of "floats" of size 10M+. I have no more assumptions.	[reply]
Re^3: need a faster median by roboticus (Chancellor) on Oct 18, 2011 at 17:31 UTC
I found that on my laptop, I can find the mode of 10 million floats in 13 seconds. Just keep a count of each value encountered and select the largest value. There's no need to sort it. Update: I forgot to mention that a hash is a good place to keep your counts... ...roboticus When your only tool is a hammer, all problems look like your thumb.	[reply]