Monolith-0 has asked for the wisdom of the Perl Monks concerning the following question:

I have just gotten into using a MySQL database. In my Perl scripts I have been accessing data from the tables directly, each time the script is run. Most of this data doesn't change very often; nor is the way the output is formatted. And whenever these things may change, it would only be changed by me, not the clients.

I was wondering, since I'm using the data this way, if I should rather get and pre-format the data into a txt file only when the data is changed. That way only the txt files has to be read rather than interacting with the DB.

Would that be a good way of doing it?

Replies are listed 'Best First'.
Re (tilly) 1: Technique Efficiancy
by tilly (Archbishop) on Apr 21, 2001 at 19:59 UTC
    The answer to that is a well-qualified "maybe".

    First of all, is performance an issue? If the answer is no, then (as with all optimization questions) don't do it.

    Secondly if performance is an issue, have you tried using mod_perl and relatives to speed things up? Until you have tried the obvious performance boosters, don't go for any changes that will complicate your code.

    But if you have done the above two, and performance is an issue, then it is valid to cache data in files to move load off of the database. However doing that introduces many new complications that can go wrong.

    All that said, if this issue concerns you, you can do something right now to make this easier. Isolate the call to get data into a single subroutine. If you have multiple scripts, move that routine into a module and call it from the scripts. This takes little work to do, is good for your code quality, and means that if this ever does become a problem, you will be able to retrofit any design you want onto the existing system quite easily.

Re: Technique Efficiancy
by arhuman (Vicar) on Apr 21, 2001 at 20:22 UTC

    If in your question, efficiency isn't only related to speed, the database option is the right choice (IMHO) for several reasons :
    • Your data are centralized in one place ! (no synch problem, easy to administer and backup)
    • Your database handle the concurrent access problems.
    • It's simple (remember Keep It Simple Stupid)

    Now if by efficiency you mean speed, we'll need more data to answer
    (only to check that this new solution could provide significant improvement over just optimizing your current script)
    your script code, the benchmarks, the data description...


"Only Bad Coders Badly Code In Perl" (OBC2IP)
Re: Technique Efficiancy
by little (Curate) on Apr 21, 2001 at 20:33 UTC
    Interacting with a DB is very helpfull if you have information that changes very often or is displayed depending on the client.
    OK, if you have lets say technical documentation for 50 tools and you store those in your DB and all clients shall get the same information then it would be waste of server power to let it generate documents on each request.
    Even your server might cache the results of these queries the documents would be generated more often than it needs to be done.
    Therefore I suggest the following:
    Create somewhat like an admin script to compare the mod date of your generated files with an field in your DB holding the date or better timestamp when the info in that datafield changed. If the document you have is older than the info, the document should be deleted and newly be generated with the new info (thats easier than changing that document).
    If you don't want to invoke that script manually you could tell your scripts that modify the data in your DB to check if the data changed will effect your documents and if it should invoke the update script (or function or procedure if you like to work with objects here).
    And you could also perform another check (here might be better to use a cronjob) to look up templates you use to generate your documents from, cause if these changed, you'll have to let all documents be done again.
    For example:
    ,---------------------. | header | `---------------------´ |m | | |e | tooldoc_31 | |n | | |u | | `---------------------´ | footer | `---------------------´
    This is quite easy for you if your server does support Server Side Includes (SSI).
    eg.: <!--#include virtual="/includes/general/header.inc"-->
    But be aware that the result of this must be an valid HTML or XML document (or whatever mime-type) you want to send to the clients browser.
    This is only an example for a dir structure:
    httpd/ | . . `-static/ | `-help/ | `-about/ `-dynamic/ | `-toolviews/ (so here could go the generated content) | `-menus/ (they might change with your content) `-includes/ | `-general/ (e.g. header, footer, and stuff that occurs very often) | `-menus/ `-templates/ | `-section_a/ | `-section_b/ . .

    Have a nice day
    All decision is left to your taste
      Before you do this, ask yourself why.

      This scheme is substantially more complex than just hitting the database. There are more files. If you ever need to reproduce the installation, it is not always obvious where to get them from. With things scattered around it will take longer for people to learn the system.

      Plus you have all sorts of possible synchronization errors. For instance you change where the files are dumped, but somewhere you still have a reference to an old location that appears to work because the file is there...then a month later you delete that and have to figure out how it ever worked. Or you lose a cron and then find out a couple of months later what it was for. Or someone decides to refresh a file with a long-running daemon which doesn't survive a reboot 6 months later.

      In short there are a lot of possible failure modes. Most of them won't happen to you, and they won't happen often. But why let them happen at all?

      That said, this can work and there are good reasons to go with a design like that. Certainly if I needed to run a high-performance website, I would look to something like that to offload work to static content that can be readily distributed across a cluster. But I wouldn't choose to make my life more complex up front when I had a choice.

        good point,
        I did it already this way once because we where producing HTML and PDF files. And to be honest I did it in this way because I wanted to do so. I wanted to try it, if it works and I made a nice frontend to it for the administration. But I "spread" the files because I don't want anyone to edit a document when all his changes would be overwritten with the next update of the content.
        Ok, the lost cronjob is a very strong concern, but are there anything alike "triggers" on a unix machine to invoke an action or script - like you have in some highly advanced RDBMS ???
        So that is why I would prefer to keep those files in separate places. Ok, another reason is that automtically updated files will/might irritate the readers of cvs logs.

        Have a nice day
        All decision is left to your taste
      Good post; one comment/observation: if you have to go to the database to get a timestamp, and make a system call to get the file timestamp, that overhead may cancel out any gain from avoiding the sql select(s) required to build the page dynamically.

      Other replies here made the case articulately for having the content be in the database and the page built dynamically.

      There are lots of subtleties, exceptions, etc., but as a first appoximation, one sql statment will usually be at least as much of a performance hit as everything else you do to build a web page. A select to get one field (i.e., timestamp) from a row and fetching html from the file system is going to be as much of a hit as selecting the whole row (assuming your row isn't many kb) and building your page on-the-fly.

      The sql performance hit for a keyed, single-row select is not based on band-width, but on the overhead of communicating with the sql engine. And it's probably an order of magnitude worse if you aren't maintaing db connections across page requests.

        but you don't need to compare on each page request - only if you update your data. Sorry, if I didn't state that clearly :-)
        But I assume here your observation is wrong!
        Continuing the tool documentation that would mean, that you might update the data may be once a week or once a month. So all queries to the DB would waste capacity, or not? If no data changed, we don't need to change the document and so we don't need to process the output for each requesting client.
        The only concern I have is the use of SSI. That is the point where I am afraid to loose time and if it wouldn't be better to compose the documents upon content creation.

        Have a nice day
        All decision is left to your taste
      That's just the sort of thing I was thinking of doing.
Re: Technique Efficiancy
by voyager (Friar) on Apr 21, 2001 at 19:58 UTC
    Unless you are already experiencing performance problems or anticipate many more users, or many more rows in the db I'd leave things alone.

    There are times when you need to do things like what you have suggested, but it makes for a more complicated and fragile code ... certainly not warranted for a one-off case with no performance problems.

Re: Technique Efficiancy
by Monolith-0 (Beadle) on Apr 21, 2001 at 20:29 UTC

    According to what you are all saying, I probably will stick with getting it strait from the database. The other way I mentioned seems like it might actually be simpler, but I don't really know.

    Preformence isn't really much of a concern for me. I was wondering also about which would be a better way to write my perl scripts. Here's some basic psuedo-code of what I have, and what I was thinking.

    Current Process:
    The view function of the script is called whenever the user visits the page.
    The add/change data function of the script is only used by me.
    view function { connect_to_databse; get_data; format_data; print; disconnect_from_databsse; } end view function add (or change) data function { get_submitted_data_from_form; connect_to_database; insert_data; (change_data;) disconnect_from_database; }
    Possible replacement:
    The entire script is only used by me, and the formatted txt file is grabbed by the user.
    add/change data function { get_submitted_data; connect_to_database; change_data; get_all_data; format_data; save_in_txt_file; disconnect_from_database; }

    Final suggestions?

Re: Technique Efficiancy
by moen (Hermit) on Apr 21, 2001 at 20:28 UTC
    Well, i would like to see more people use db as there primary data storage facilltiy.
    Why?
    Well it makes general administration a lot easier, backup/restore of a broken site is easier, you are contained to upheld the rules of data integerity (upheld by the db) you also got a very flexible solution witch is easy to move/replicate to new sites.

    Besides this, when talking about static data, use proxy servers in front of your db server (Oracle got this right). It caches data until it changes on the db, and when it changes the db i queried and the cache updated. Nedless to say that you increase your application speed.
    Not sure how to use cache together with MySQL db, but it's probably some solution to that also.

    moen

A reply falls below the community's threshold of quality. You may see it by logging in.