This reminds me of two modules: Test::Deep and Test::Regression. The former allows the module author to test complex data structures. The latter, which I have some interest in, allows a module author to compare output against previously generated output and regenerate the reference files if necessary. I think if built upon the two you would have exactly what you are trying to build.