baxy77bax has asked for the wisdom of the Perl Monks concerning the following question:

Hi, So I am playing around with my music collection and the thing is I don't want to re-invent the wheel. Also i know that I could do this using a straight forward approach to iterate over and over again since my music collection is not that big. but for the sake of the argument let assume that I have more money then B. Gates, C. Slim Heul, W. Buffet, A. Ortega ect.. joined together and i have spent all my money on songs. So i have this huge list of songs and i need to produce Top 20 hits every second. Also let assume that this list changes each second, that is, top 20 from the previous iteration does not imply top 20 in the next iteration.

So what would be the best approach to create those top 20 lists. I was thinking something along these lines:

Since i am looking for top 20 there has to be a score associated to each song. So i iterate through this list and using binary search try to see if and where each song should go . If i conclude that the song should be within top 20 i start copying my top20 array until i reach the position where my song should go, insert my song and copy the rest.

So my question is : Is there maybe a better way to do this (I am not looking for code just a discussion)

thank you

baxy

PS

this is not a homework though i described it as such :), therefore i don't need any code - basically this should be a variant of a google search engine, right?

Replies are listed 'Best First'.
Re: top 20 song hits
by CountZero (Bishop) on Dec 01, 2013 at 16:49 UTC
    Keep the scores in a database and use standard SQL to get the info out. Database engines do that all the time and are highly optimized to do it well, fast and correct.

    Actually I have a Logitech Slimserver / Squeezeserver (or whatever is its present name). Everytime I play a song and don't fast-forward it to the next one, the server increases that song's "popularity". If I want to listen to the "most popular songs" or even "most popular songs not recently played" (or any other combination of parameters attached to each song), the server goes out to the database and finds them by using some standard SQL.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
Re: top 20 song hits
by oiskuu (Hermit) on Dec 02, 2013 at 00:58 UTC
    What you need is a priority queue, such as the Fibonacci heap.

    For perl, there is Array::Heap, which ought to perform adequately.
    This following does 1M counts + 1000 toplists in a second or two.
    (Update: performance comparisons with a database are welcome.)

    #! /usr/bin/perl -l use Array::Heap; print "starting"; my @h = map [-rand, "item $_"], 'aa000' .. 'zz999'; print int(@h), " elements"; make_heap @h; print "heap made"; sub hits { --$h[$_][0], adjust_heap @h,$_ for @_; } sub batch { hits map {int @h*rand()**4} 1..$_[0]; } sub rotate { $_->[0] *= rand, $_->[1] .= '.', push_heap @h,$_ for map {pop_heap @h} 1..$_[0]; } sub top { map sprintf("%8d %s", -$_->[0], $_->[1]), map {push_heap @h,$_;$_} map {pop_heap @h} 1..$_[0]; } for (;;) { batch(1000), top(10) for 1..1000; print for "", top(10); rotate(3); }

      Is a heap the right structure? The "adjust_heap" operation must have a time complexity of some significance. I too thought of a heap at first, but can't get around the thought that adjusting the priority of potentially every single song, once per second, seems like it would mean re-heaping potentially the entire structure once per second.

      I hope I'm wrong. ;)


      Dave