oakbox has asked for the wisdom of the Perl Monks concerning the following question:

I came up with a nifty speed trick and I'm wondering if this is a Good Thing (TM) or just a kludge. I have a lot of data spread across about 15 tables in MySQL. One hit to a page in some parts of my application can generate as many as 15 hits to my MySQL database in the background (using DBI).

Some of my data is pretty static, changing only once every few weeks. Some of the searches performed against the data, while complex, will yield the same data over several iterations. Let me cut to the chase here, I have been using XML to store the results of my database calls. The next time I call that search, I check for the existence of the appropriate XML representation, if I have it, I load THAT into a nice reference, if the XML version doesn't exist, run the MySQL code, return the reference, and save a copy of it to XML for the next iteration.

I also have code embedded in my maintenance scripts to wipe the appropriate XML files whenever the underlying data changes and scripts to run through my data and generate the most commonly needed XML representations. Generating the XML in the background means that when someone hits the web site there will most likely be an XML version of the data they are looking for already in the system.

I am not using this for all cases, just where it makes sense to do so. The speed increase in using this is incredible. I have gone from some complex web pages taking anywhere from 1 to 4 seconds to load, to under half a second (this is an estimate, I haven't benchmarked the XML queries). As a bonus, I now have a nice data-export tool that dumps queries into an XML format that I can share across system boundaries.

#### Cut, subroutine from module foreach my $tid (@completed){ # does an XML interpretation already exist? my $cache_file = "/xmldumps/$tid.xml"; my $fhh; my $flag; if(-e ($cache_file)){ open($fhh, "<$cache_file") || die "Boom $!"; $results{$tid} = XMLin($fhh); close($fhh); }else{ $results{$tid} = $self->Complex_Crap($tid); $self->Save_XML($tid,$results{$tid}); } } $self->Results(\%results); return($self); } sub Save_XML { my ($self,$tid) = @_; # Save XML representations of data to file system my $cache_file = "/xmldumps/$tid.xml"; my $file_handle; open($file_handle,"> $cache_file"); my $data_representation = XMLout($object); print $file_handle $data_representation; close($file_handle); }

This the the 'magic' part. Read the XML, or perform the complex database search, and save an XML version of the results for future reference.

Obviously, this wouldn't work for binary data, or data where I need to preserve order (making XML::Simple save order is a pain in the ass), but in this case, I don't need binary or order :) My question is, "have you tried anything similar to this" and "is this a good solution"?
-oakbox

Replies are listed 'Best First'.
Re: Saving MySQL Queries in an XML format
by gmax (Abbot) on Jul 05, 2002 at 08:13 UTC
    If you want, you can also save binary data by encoding it into a hex string.
    my $hexstr = unpack("H*", $barestr); # However, notice that $hexstr will be twice as large as $barestr
    As for your method, it is a cache engine, which can work fine if your data does not change frequently.
    If you data is updated more often, though, your cache engine would be more expensive than a simple query.
    Moreover, I am not sure that your solution would hold for large collections of data (more than 10,000 records). After all, this is the main reason to use a database engine.

    The golden rule is: if your data is either small or non-complex, then use a file (eg: BDB), else use a database.

    Personally, I have experienced that the normal MySQL table cache, when given enough memory, is more than reasonably fast, even for complex queries.
    _ _ _ _ (_|| | |(_|>< _|
Re: Saving MySQL Queries in an XML format
by IlyaM (Parson) on Jul 05, 2002 at 11:18 UTC
    Why XML for data serialization? Do you need human readable cache storage? If not you could use Storable instead of XML::Simple.
    • It is probably faster (I have not made any benchmarks though).
    • It can handle binary data.
    • It have no problems with preserving order.

    Morever you could use Cache::Cache to handle caching. This module provides quite high level API for data cache (and AFAIK it uses Storable to serialize/deserialize data).

    I've rewrote you example using Cache::Cache:

    # init cache object at the begining of the program my $CACHE = new Cache::FileCache( ); ..... ..... #### Cut, subroutine from module foreach my $tid (@completed){ $results{$tid} = $CACHE->get($tid); unless(defined $results{$tid}) { $results{$tid} = $self->Complex_Crap($tid); # data will expire in cache in 10 minutes $CACHE->set($tid => $results{$tid}, "10 minutes"); } } $self->Results(\%results); return($self); }

    Update: Added example of Cache::Cache usage.

    --
    Ilya Martynov (http://martynov.org/)

      Beware. Cache::Cache is meant as a cache, not as a storage mechanism. It could reply "not found" to every single request to recover a previous value, and still be fully within spec (although a poor implementation).

      The point of a cache is to possibly avoid re-computing a computable item, not to hold a non-computable item.

      To store a non-computable item, use a proper database.

      -- Randal L. Schwartz, Perl hacker

        Well, it's not just a simple query against the database. On some a complex queries (the ones I'm storing in XML), there are 10 to 15 separate database hits and then some calculations on the values I'm getting back out of it. Then I'm sticking all of the values into a reference (for moving the data from module back to caller) then it is a big help to store that reference using XML::Simple. The whole interface of XML::Simple is built around storing and reading references, so I didn't need to do any 'extra' work to start using it.

        The ability to export my ready-made XML objects to other systems is just icing on the cake. In future improvements, I can see the possibility of a customer not wanting to deal with data over my web portal, but being able to import my data directly into their back-end.

        I guess what I'm trying to figure out is, if this was a crazy way to implement this solution, and it looks like it's not TOO crazy :)
        -oakbox

Re: Saving MySQL Queries in an XML format
by yodabjorn (Monk) on Jul 05, 2002 at 09:37 UTC
    This reminds me of vignette(5.6). Vignette uses a page caching mechanism tho.. Templates define dynamic content and that content is generated into pages. those pages are writeen to disk on the webserver. Then when a query comes in the webserver contacts the template manger to see if that document has changed. if it has then the template manager rebuilds the page and sends it out to the Filesystem. If there are no changes then the webserver reads the page from disk..

    This is a decent way to deal with slowly changing content. haveing some switch to mark a URL or a template as "changed". Anyhow the XML query caching is really cool ++

Re: Saving MySQL Queries in an XML format
by broquaint (Abbot) on Jul 05, 2002 at 16:41 UTC
    If you're just doing data storage then you might want to check out the up and coming YAML. It was designed with serializing data in mind and frequently mentions perl in the spec (which might because some of the creators are perl hackers). Currently I'm looking for an excuse to use it as it does look very nifty indeed.
    HTH

    _________
    broquaint