Now, performance obviously depends on CHI's search/get/set and I don't know what that is (can't see that from a diagonal look at the documentation).

But something obvious is that Data::Throttler uses the notion of buckets, which are counters for a time range. And from what I can gather it uses an array to store those buckets. But you (CHI that is) use a hashtable (I guess?). So there is definetely a penalty there in terms of amortized performance but definetely it is not accounting for the order of magnitude you observe. But, still, the advantage of Data::Throttler is that it uses data structures and algorithms for that specific use-case (of summing buckets). It pays to be specific.

So, it costs time to create the list of indices of items, fetch them and then sum them, as you do. Data::Throttler will only sum a continuous range (i.e. a loop) in its internal buckets array. That's a 3rd of your cost already. If CHI had a method to apply an operation (sum) over a list of indices or better still a continuous range of items, you could cut your costs a bit. But then that's not something one would expect from cache storage!

The "really bad" news is that Data::Throttler can optimise its internal logic even more. And also add more features. Can you keep up with that by using a generic tool?

Having said all that, I can think of one way to improve performance (but then Data::Throttler can also do that and be ahead, again). Instead of having each bucket storing a count of hits over its time-interval, make it keep an incremental count of hits. I.e. keep the sum of counts in each bucket of all its previous buckets (not the dead ones). In this way you do not need to do a sum each time try_push() is called. You will have that sum ready waiting for you in the last bucket. However, that sum will be invalid once a bucket is too old and had to be removed. Whenever a bucket gets too old, you have to subtract its count from the total count. So, create a function total_count() which either re-calculates the count if buckets were invalidated since last time it was called, or returns a cached count. Re-calculating is really easy: subtract the difference of the old bucket's count to its newer neighbour from all buckets. Then, if a hit is added, you only need to add it to the newest bucket. The total count will be held by the newest bucket and will need to be re-calculated if you ask for it after time-interval seconds since last time you called it. So, here is finally some use of cache :)

bw, bliako


In reply to Re^2: Optimizing a CHI-based data throttler by bliako
in thread Optimizing a CHI-based data throttler by perlancar

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.