Re (tilly) 1: Technique Efficiancy
by tilly (Archbishop) on Apr 21, 2001 at 19:59 UTC
|
The answer to that is a well-qualified "maybe".
First of all, is performance an issue? If the answer is
no, then (as with all optimization questions) don't do
it.
Secondly if performance is an issue, have you tried using
mod_perl and relatives to speed things up? Until you have
tried the obvious performance boosters, don't go for any
changes that will complicate your code.
But if you have done the above two, and performance is an
issue, then it is valid to cache data in files to move load
off of the database. However doing that introduces many
new complications that can go wrong.
All that said, if this issue concerns you, you can do
something right now to make this easier. Isolate the
call to get data into a single subroutine. If you have
multiple scripts, move that routine into a module and
call it from the scripts. This takes little work to do,
is good for your code quality, and means that if this
ever does become a problem, you will be able to retrofit
any design you want onto the existing system quite
easily. | [reply] |
Re: Technique Efficiancy
by arhuman (Vicar) on Apr 21, 2001 at 20:22 UTC
|
If in your question, efficiency isn't only related to speed, the database option is the right choice (IMHO) for several reasons :
- Your data are centralized in one place ! (no synch problem, easy to administer and backup)
- Your database handle the concurrent access problems.
- It's simple (remember Keep It Simple Stupid)
Now if by efficiency you mean speed, we'll need more data
to answer
(only to check that this new solution could provide significant improvement over just optimizing your current script)
your script code, the benchmarks, the data description...
"Only Bad Coders Badly Code In Perl" (OBC2IP)
| [reply] |
Re: Technique Efficiancy
by little (Curate) on Apr 21, 2001 at 20:33 UTC
|
Interacting with a DB is very helpfull if you have information that changes very often or is displayed depending on the client.
OK, if you have lets say technical documentation for 50 tools and you store those in your DB and all clients shall get the same information then it would be waste of server power to let it generate documents on each request.
Even your server might cache the results of these queries the documents would be generated more often than it needs to be done. Therefore I suggest the following:
Create somewhat like an admin script to compare the mod date of your generated files with an field in your DB holding the date or better timestamp when the info in that datafield changed. If the document you have is older than the info, the document should be deleted and newly be generated with the new info (thats easier than changing that document).
If you don't want to invoke that script manually you could tell your scripts that modify the data in your DB to check if the data changed will effect your documents and if it should invoke the update script (or function or procedure if you like to work with objects here). And you could also perform another check (here might be better to use a cronjob) to look up templates you use to generate your documents from, cause if these changed, you'll have to let all documents be done again.
For example:
,---------------------.
| header |
`---------------------´
|m | |
|e | tooldoc_31 |
|n | |
|u | |
`---------------------´
| footer |
`---------------------´
This is quite easy for you if your server does support Server Side Includes (SSI).
eg.: <!--#include virtual="/includes/general/header.inc"-->
But be aware that the result of this must be an valid HTML or XML document (or whatever mime-type) you want to send to the clients browser.
This is only an example for a dir structure:
httpd/
|
.
.
`-static/
| `-help/
| `-about/
`-dynamic/
| `-toolviews/ (so here could go the generated content)
| `-menus/ (they might change with your content)
`-includes/
| `-general/ (e.g. header, footer, and stuff that occurs very often)
| `-menus/
`-templates/
| `-section_a/
| `-section_b/
.
.
Have a nice day
All decision is left to your taste | [reply] [d/l] [select] |
|
|
Before you do this, ask yourself why.
This scheme is substantially more complex than just hitting
the database. There are more files. If you ever need to
reproduce the installation, it is not always obvious where
to get them from. With things scattered around it will
take longer for people to learn the system.
Plus you have all sorts of possible synchronization
errors. For instance you change where the files are
dumped, but somewhere you still have a reference to an
old location that appears to work because the file is
there...then a month later you delete that and have to
figure out how it ever worked. Or you lose a cron and
then find out a couple of months later what it was for.
Or someone decides to refresh a file with a long-running
daemon which doesn't survive a reboot 6 months later.
In short there are a lot of possible failure modes. Most
of them won't happen to you, and they won't happen often.
But why let them happen at all?
That said, this can work and there are good reasons to
go with a design like that. Certainly if I needed to run
a high-performance website, I would look to something like
that to offload work to static content that can be readily
distributed across a cluster. But I wouldn't choose to
make my life more complex up front when I had a choice.
| [reply] |
|
|
good point,
I did it already this way once because we where producing HTML and PDF files. And to be honest I did it in this way because I wanted to do so. I wanted to try it, if it works and I made a nice frontend to it for the administration. But I "spread" the files because I don't want anyone to edit a document when all his changes would be overwritten with the next update of the content.
Ok, the lost cronjob is a very strong concern, but are there anything alike "triggers" on a unix machine to invoke an action or script - like you have in some highly advanced RDBMS ???
So that is why I would prefer to keep those files in separate places. Ok, another reason is that automtically updated files will/might irritate the readers of cvs logs.
Have a nice day
All decision is left to your taste
| [reply] |
|
|
Good post; one comment/observation: if you have to go to the database to get a timestamp, and make a system call to get the file timestamp, that overhead may cancel out any gain from avoiding the sql select(s) required to build the page dynamically.
Other replies here made the case articulately for having the content be in the database and the page built dynamically.
There are lots of subtleties, exceptions, etc., but as a first appoximation, one sql statment will usually be at least as much of a performance hit as everything else you do to build a web page. A select to get one field (i.e., timestamp) from a row and fetching html from the file system is going to be as much of a hit as selecting the whole row (assuming your row isn't many kb) and building your page on-the-fly.
The sql performance hit for a keyed, single-row select is not based on band-width, but on the overhead of communicating with the sql engine. And it's probably an order of magnitude worse if you aren't maintaing db connections across page requests.
| [reply] |
|
|
but you don't need to compare on each page request - only if you update your data. Sorry, if I didn't state that clearly :-)
But I assume here your observation is wrong!
Continuing the tool documentation that would mean, that you might update the data may be once a week or once a month. So all queries to the DB would waste capacity, or not? If no data changed, we don't need to change the document and so we don't need to process the output for each requesting client.
The only concern I have is the use of SSI. That is the point where I am afraid to loose time and if it wouldn't be better to compose the documents upon content creation.
Have a nice day
All decision is left to your taste
| [reply] |
|
|
That's just the sort of thing I was thinking of doing.
| [reply] |
Re: Technique Efficiancy
by voyager (Friar) on Apr 21, 2001 at 19:58 UTC
|
Unless you are already experiencing performance problems or anticipate many more users, or many more rows in the db I'd leave things alone.
There are times when you need to do things like what you have suggested, but it makes for a more complicated and fragile code ... certainly not warranted for a one-off case with no performance problems. | [reply] |
Re: Technique Efficiancy
by Monolith-0 (Beadle) on Apr 21, 2001 at 20:29 UTC
|
According to what you are all saying, I probably will stick with getting it strait from the database. The other way I mentioned seems like it might actually be simpler, but I don't really know.
Preformence isn't really much of a concern for me. I was wondering also about which would be a better way to write my perl scripts. Here's some basic psuedo-code of what I have, and what I was thinking.
Current Process:
The view function of the script is called whenever the user visits the page.
The add/change data function of the script is only used by me.
view function {
connect_to_databse;
get_data;
format_data;
print;
disconnect_from_databsse;
} end view function
add (or change) data function {
get_submitted_data_from_form;
connect_to_database;
insert_data; (change_data;)
disconnect_from_database;
}
Possible replacement:
The entire script is only used by me, and the formatted txt file is grabbed by the user.
add/change data function {
get_submitted_data;
connect_to_database;
change_data;
get_all_data;
format_data;
save_in_txt_file;
disconnect_from_database;
}
Final suggestions? | [reply] [d/l] [select] |
Re: Technique Efficiancy
by moen (Hermit) on Apr 21, 2001 at 20:28 UTC
|
Well, i would like to see more people use db as there primary data storage facilltiy.
Why?
Well it makes general administration a lot easier, backup/restore of a broken site is easier, you are contained to upheld the rules of data integerity (upheld by the db) you also got a very flexible solution witch is easy to move/replicate to new sites.
Besides this, when talking about static data, use proxy servers in front of your db server (Oracle got this right). It caches data until it changes on the db, and when it changes the db i queried and the cache updated. Nedless to say that you increase your application speed.
Not sure how to use cache together with MySQL db, but it's probably some solution to that also.
moen | [reply] |
| A reply falls below the community's threshold of quality. You may see it by logging in. |