Re: How to test caching?
by kyle (Abbot) on Aug 20, 2008 at 19:21 UTC
|
Answer one: Use Memoize. It's already been tested, so you know it works, and it's a core module, so you know it's there.
Answer two: Use Memoize's method. If you look in the Memoize distribution, there's a test called "speed.t". It works by taking a naive fib implementation, calling it with larger and larger numbers until it takes some given time to execute. Once you've seen it take that amount of time, memoize it, and see if it gets faster. In your own foo() example, break out the $result computation into a separate sub that you can time separately.
Answer three: Pass in a tied scalar as your $arg. Specifically, make a tied scalar that counts the number of times it's been accessed. If you get fewer accesses to it the second time, you know that it was used only to look up the cached result and not to compute the result itself (depending on the algorithm involved, of course). See A heap of medians: efficiency vs. speed for an example. This may not work if your algorithm doesn't actually access the $arg any more than "return $cache{$arg} if exists $cache{$arg}" does. It also might not work if the algorithm copies the scalar once and never uses it again. You might have to overload something so that even the copies exhibit the behavior you want. Implementing and testing this test method might be more trouble than using it to test your foo().
Answer four: Use Memoize.
As an aside, you might find it useful to read When to use Prototypes?
| [reply] [d/l] [select] |
|
|
Quick question on passing in a tied scalar:
my $arg = shift;
Does the magic of the passed-in tied scalar pass on to $arg? Or does $arg merely (a plain scalar) contain the copy of the tied variable? | [reply] [d/l] [select] |
|
|
What happens when you try it?
The magic goes away. I didn't know that for sure when I posted, but I suspected that would be the case. That's why I also mentioned using overload.
That works fine except for having to overload '=' too.
Either way, I think it's altogether too much hassle to test a cache so simple.
| [reply] [d/l] [select] |
|
|
A reply falls below the community's threshold of quality. You may see it by logging in. |
Re: How to test caching?
by moritz (Cardinal) on Aug 20, 2008 at 17:30 UTC
|
There are two things I can think about:
As the first option just benchmark it. If something costly is actually costly, you can run it a few times and check if the run time is that of a few hash look ups or of doing something costly a few times. Particularly compare the time for the first call to subsequent calls.
The second idea is to use PadWalker to peek into the sub's lexical pad.
That being said caching is a mechanism that can easily be factored out (Memoize) and tested separately, so usually you don't need that sort of test at all. | [reply] [d/l] |
A reply falls below the community's threshold of quality. You may see it by logging in. |
Re: How to test caching?
by Fletch (Bishop) on Aug 20, 2008 at 17:32 UTC
|
Caching being more of a performance optimization I'd be more concerned that the cached version was returning the correct answer than if it's using the cache. I mean you could do something wonky like using Time::HiRes to measure how long a cold call takes and compare that with how long a subsequent call for the same arguments takes, but I don't know that you really gain anything by doing so.
(And have you looked at what tests Memoize does? Even if you're not going to use it you might gain inspiration from it? (not that I've looked at its tests either, just mentioning the idea))
The cake is a lie.
The cake is a lie.
The cake is a lie.
| [reply] |
A reply falls below the community's threshold of quality. You may see it by logging in. |
Re: How to test caching?
by BrowserUk (Patriarch) on Aug 20, 2008 at 17:33 UTC
|
Time it. Call the sub twice with the same parameter and time how long it takes both times. If the second is quicker than the first, it used the cache.
If it's slower, abandon caching.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
Re: How to test caching? (simple)
by tye (Sage) on Aug 21, 2008 at 03:42 UTC
|
sub foo {
my $arg = shift @_;
state %cache;
return $cache{$arg} if exists $cache{$arg};
return $cache{$arg} = something_costly( $arg );
}
Test code:
use Immutable::UntestableCode qw( foo );
my $real_thing = \&Immutable::UntestableCode::something_costly;
my $calls = 0;
*Immutable::UntestableCode::something_costly = sub {
$calls++;
return $real_thing->( @_ );
};
foo( 1 );
is( 1, $calls );
foo( 1 );
is( 1, $calls );
| [reply] [d/l] [select] |
Re: How to test caching?
by Limbic~Region (Chancellor) on Aug 20, 2008 at 17:44 UTC
|
JavaFan,
I am not sure this is a good way, but it is "a way".
use constant TEST_MODE => 1;
sub foo ($) {
my $arg = shift;
state %cache;
return $cache{$arg} if exists $cache{$arg};
if (TEST_MODE) {
# ... something that would only run if cache failed
}
# ...
}
Of course TEST_MODE needn't be a constant - it could be anything that detected this was running under a test harness. There was a node a while back about marking code as deprecated that would issue a warning during testing but would function normally in production. See RFC: Devel::Deprecate for possible ideas.
| [reply] [d/l] |
A reply falls below the community's threshold of quality. You may see it by logging in. |
Re: How to test caching?
by gone2015 (Deacon) on Aug 21, 2008 at 00:33 UTC
|
How would you write a test that checks whether the cache is actually used? Clearly you cannot access %cache to test as its lexical in the sub.
Scanning the other postings, it appears that part of the requirement is that the code cannot be changed to instrument or otherwise report on its behaviour.
Also, you've got other tests to check that the returned values are correct.
So the purpose of this testing is simply to check that:
return $cache{$arg} if exists $cache{$arg};
and:
$cache{$arg} = $result;
do what they're expected to do.
So why wouldn't they ? Is it possible for $arg to be something that doesn't map well to a hash key... an object, for example ? Can take values that are equivalent, but don't look the same when stringified ? Is it possible for $result to be something that cannot be stored in a hash, or whose validity or integrity is not guaranteed over time ?
The code as given doesn't betray any doubt that all arguments and results are suitable.
So, I guess the testing you have in mind is essentially:
- Try foo(bar), where we know this is the first time bar has been evaluated. Check (a) that result did not come from the cache, and (b) that the result ends up in the cache.
- Try foo(rab), as above.
- Try foo(rab) again. Check (a) that the result came from the cache, and (b) that it's the expected value.
- Try foo(bar) again, also as above.
Unless we're concerned that the cache mechanism is not deterministic, that seems enough to me. This quick sequence of tests checks that: rubbish from the cache is not returned the first time a given argument is presented; that it isn't just returning either the first or the last result it put in the cache; and that the cache holds more than one result. For the amusement value, however, this could be expanded to a larger number of randomly selected arguments, presented in a random(ish) order.
In fact, it doesn't seem necessary to check that the result is recorded the first time foo() is called for a given argument; if it wasn't, a later test would fail because the result would be recalculated. (This is just as well, as the cache is hidden inside the subroutine, and you've rejected the Padwalker as a way to poke at it.)
So, the one piece of information that each test needs is: was the result established anew, or did it come from the cache ? If the operation something costly is reliably many, many cycles then user CPU time appears to be the one thing that a completely black-box test can use.
(I am assuming here that the testing can be run from a known state of the cache -- empty being the obvious one.)
Timing doesn't seem wholely satisfactory. So, what I would do is introduce a counter (that can be read by the test) to tick each time a result has to be established. Making this a permanent part of the code means that the test is entirely external. Incrementing a counter each time the something costly executes seems unlikely to be a big overhead.
Actually, I'd be tempted to put in a second counter, either to count total number of calls of foo() or to count the cache hits -- which could be interesting for ongoing monitoring of the performance of the program.
Mind you, having stepped through the code with the debugger, and established that both cache-hit and cache-miss cases do what they are supposed to do, I'd wonder what it was needed testing... It would be different if the code were more complicated... for example: if there was code to age out cache entries, which I'd be tempted to stress test. | [reply] [d/l] [select] |
Re: How to test caching?
by pileofrogs (Priest) on Aug 20, 2008 at 17:32 UTC
|
sub foo ($) {
my $arg = shift;
state %cache;
if exists $cache{$arg} {
print "Got that from the cache!\n";
return $cache{$arg}
}
print "Didn't find that in cache...\n";
my $result = ... something costly ...;
$cache{$arg} = $result;
return $result;
}
| [reply] [d/l] |
A reply falls below the community's threshold of quality. You may see it by logging in. |
Re: How to test caching?
by Porculus (Hermit) on Aug 22, 2008 at 08:07 UTC
|
But "use Memoize" isn't an answer to the question. I think you misunderstand the nature of PerlMonks. We are not slaves who Must Do As You Command, or some kind or mechanical oracle that takes precisely-worded questions and magically returns answers. When you post a question, we don't carefully parse it to determine the precise semantics thereof: what happens is that people look at what you said, identify the actual problem, and point out ways in which that can be solved. "Use Memoize" might not be a reply to the precise words of your question, but it most certainly would be a solution to the underlying problem -- as well as a better practice in general. (You say Memoize has features you don't need? Don't use those features, then. It's kind of silly to reinvent the wheel just because the standard wheel comes with brakes and you don't think you'll ever want to slow down.)
| [reply] |
Re: How to test caching?
by repellent (Priest) on Aug 21, 2008 at 08:50 UTC
|
How far do we stretch the meaning of modify?
use Smart::Comments;
sub foo ($) {
my $arg = shift;
state %cache;
return $cache{$arg} if exists $cache{$arg};
my $result = ... something costly ...;
### Look MA! No hands! Really...
$cache{$arg} = $result;
}
| [reply] [d/l] |
Re: How to test caching?
by FunkyMonk (Bishop) on Aug 20, 2008 at 19:03 UTC
|
- Use the perl debugger.
- Have some sentinel value (eg "I CAN HAS CASH PLZ.", or a SHA1) that dumps the cache instead of executing your expensive subroutine.
- Move state out of the subroutine while testing. Or something like
{
my %cache;
sub foo { ... }
sub dump_foo {
require Data::Dumper;
print Data::Dumper::Dumper(\%cache);
}
}
| [reply] [d/l] |
A reply falls below the community's threshold of quality. You may see it by logging in. |
Re: How to test caching?
by ysth (Canon) on Aug 20, 2008 at 22:59 UTC
|
Yuck, prototypes.
Memoize is an interesting proof of concept, but it is so easy to do simple caching like this that I find no use for it. If I had a candidate for memoization that had many different patterns of parameters or possible return contexts, I might consider it. But even then, you can do a whole lot of stuff in the time taken by Memoize's overhead so I'd want to see benchmarks to prove it was actually of use.
Use PadWalker to access %cache outside the sub. (I'm guessing it should work on 5.10's state variables, but don't know it for a fact.)
| [reply] |
Re: How to test caching?
by ikegami (Patriarch) on Aug 22, 2008 at 01:11 UTC
|
use constant TEST_NOCACHE => $ENV{TEST_NOCACHE};
sub foo ($) {
my $arg = shift;
state %cache;
return $cache{$arg} if !TEST_NOCACHE && exists $cache{$arg};
my $result = ... something costly ...;
$cache{$arg} = $result;
}
Thanks to constant folding, the above compiles identically as the original code under normal circumstances, and all effects of the cache are removed when the flag is true.
The effect of the cache can be observed by finding the difference between runs of
TEST_NOCACHE=0 cache.t
and
TEST_NOCACHE=1 cache.t
| [reply] [d/l] [select] |
Re: How to test caching?
by Perlbotics (Archbishop) on Aug 20, 2008 at 17:46 UTC
|
If modifying foo is not an option, then I would also suggest to do some timing tricks.
But maybe you can modify foo such, that when called in array context, it returns an indicator if the data was retrieved from the cache or not?
# normal use
$res = foo($arg);
# test use
($res, $cached) = foo($arg);
| [reply] [d/l] [select] |
Re: How to test caching?
by Anonymous Monk on Aug 20, 2008 at 21:54 UTC
|
JabaFan. you are reely somthing. People ask questions adn you tell how stupid they are. Yo do homework for ones who put there homework here. Now you ask help for sumthing and you tell people helping they are stupid. You say you been programing for 30 years, but I not sure i agree. How can somebody who act so young be that old. | [reply] |
Re: How to test caching?
by talexb (Chancellor) on Aug 21, 2008 at 14:20 UTC
|
BTW, I know about Memoize. But "use Memoize" isn't an answer to the question.
Really? So what's your caching question then, if the answer isn't Memoize?
Alex / talexb / Toronto
"Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds
| [reply] |
|
|
So what's your caching question then, if the answer isn't Memoize?
While I don't like myself the attitude that JavaFan is exhibiting throughout this thread (and elsewhere) I'd like to (try to) address your very question, which basically quite about everybody has also been asking here. For one thing, Memoize is a very generic tool: as such it uses a general technique, one which ought to work reasonably well in all cases. Namely, even if it has quite a lot of options, it basically stores cached parameters in a hash, whereas one may have a specialized (numeric) function that would benefit from memoizing into an array, and since caching is all about performance, this is one of those situations in which a relatively small efficiency gain (wrt the one of caching at all) would matter! The same can be said of generating and calling a wrapper sub, althought ISTR that the original one is then called through magic goto - it's still one more level of indirection.
| [reply] [d/l] |
A reply falls below the community's threshold of quality. You may see it by logging in. |