Recently I've been toying with Apache::Session a bit more as part of a new web application
that I'm building (MVCC framework implemented in PageKit). One feature of this application is
to allow its users to upload multiple images. However, I have to store those images in the database (mysql) rather than conventional
way on the hard-drive (hack, either way things end up on HD, but you know what I mean eh? :).
This is one of the key requirements of this web app. Storing images in the database
provides for greater control as images belonging to one user may not be viewed by anyone else
but the same user.
On one particular form I'm allowing a logged in user to upload up to 5 jpeg images + additional
information for each. There's also a bunch of extra fields on the form. One decision that I have
to make is where do I store the image data until all of the form is completed? How do I manage
temporary form data (large images) and also make sure that the system is not overloaded
with too much of that temporary data? Apache::Session is good in a way that sessions may expire,
and when they do all data associated with a session is eliminated thereby freeing up resources and
etc.
There are a few alternative routes which I'd looked into:
- Keep a record of the image in session. This record will only hold image description and
path to the temporary file on the server. The image and it's content won't be committed
to the database until after the entire form is completed (not just the image portion).
Displaying the (small version of) image on the (yet-to-be-completed) form will then be a
trivial matter of pointing to the temporary image url.
As soon as the form is submitted the temporary image files will be removed from the server.
However, if the user simply shuts his/her browser, there is no way to immediately remove the images.
One option is I could write a Perl script that would run from cron and clean the temporary image
directory of files that are too old.
Yet, considering my application security requirements, with this option there's a lot of
chance of making these images available to the internet public. So, this being highly undesirable
I've opted not to go ahead with this approach.
- Keep image description and it's content in session. Write a script to display the
image from session on the form. This script will basically have to spit out 'image/jpeg' content
from the session. The clear advantage is that only the current signed in user is able
to access his images. The draw back is that since I've configured my session
data to be stored in a database 'sessions' table (standard) adding new images will further
load my server (and is also slower than the disk file option in the first approach).
Once the form is submitted, image data is saved into the database and removed from session space.
On browser close, however, data will remain in session until it expires (?). As is the case with
the first approach, I'd have to provision for a way of cleaing the sessions table off expired
session keys?
- Keep image description and content in a global hash (remember this is mod_perl) under user
session key. One advantage here is that adding new images and retrieving image content
is pretty quick and doens't require database resources. Also, as with the previous approach,
only the currently signed in user is able to access his/her images.
Concering performance issues, if the global data grows large (users keep shutting their
browsers after having uploaded 5+ images :) I may have to restart the server. Restarting the
server too often doesn't smell 'the right way' either ;).
But even with this option, there's a way to clean the global hash data. For example, I could
add additional 'handler' (a method in the model class of the MVCC framework) that would allow
a web administrator to view all session data and clean it up as needed (say, clean all expired
sessions only).
So, which out of the above options you feel will work best for me? Or from your own experience,
what was the approach that you took and how did it all play out? At this stage, I'm hanging between
the second and third options. Each look appealing to me, making it all the harder to decide which
is the right one to pursue.
Please pardon me for keeping this post so long. I've tried to be as clear as possible (which I doubt
I've achieved :). I've also decided to make it a 'meditational' post due to the fact that the
questions I'm asking here require moderate discussion and pondering. Frankly, the more people
I could get input from the better ;-)
_____________________
# Under Construction
Re: mod_perl web app design considerations
by perrin (Chancellor) on Sep 03, 2002 at 21:29 UTC
|
First, I don't see why you have to put the images in the database. Keep the metadata in the database (path, name, who it belongs to) and keep the data in a normal file. Putting large binary files in a database almost always leads to trouble later on, and makes it impossible to do simple backups, moves, etc.
About your option 3: how are you imagining you would implement this global hash? I think you're forgetting that each apache child process has separate globals with no sharing between them. You would have to share using disk or shared memory (with a module like IPC::MM).
Option 2 will use up a lot of memory quickly. When you load a large image into memory in an apache process (which is what you would be doing here), that process will never shrink back down. The memory can be reused by that process, but it won't be given back to the general pool of free memory. That means that one user sending multiple requests over the course of a session with a 500k image can use up MBs of memory on your server.
Option 1 sounds best. What's the security concern? You wouldn't be using your htdocs directory as temp space, would you? I don't see how anyone would see these images without your intention. | [reply] [Watch: Dir/Any] |
Re: mod_perl web app design considerations
by valdez (Monsignor) on Sep 03, 2002 at 20:11 UTC
|
Interesting meditation!
I have another solution. Keeping images, maybe large ones,
in memory or in a database forces you to use mod_perl to
deliver them: you have a long print that can be instead
handled by apache itself. You are replicating content
generation and embedding security inside this phase of
apache's life cycle.
What you need instead is authentication, authorization
and access control. Following this route you need 'only' to
create directories with access rights embedded in their
names. A dedicated access control can give authorization
to display some content to the real apache, gaining in
speed and modularity.
If you need to share images between many servers I think
NFS fyle system is a better option. Some discussions about
this option can be found on mod_perl mailing list.
Hope this helps. Ciao, Valerio
| [reply] [Watch: Dir/Any] |
|
thanks for your reply, valdez! :)
You go on to say...
What you need instead is authentication, authorization and access control.
But for this to work, wouldn't I have to implement my own Apache module to intercept requests and do authentiation and authorization based on the value of the requested URI? At this stage, I've already written moderate amount of code (due to tight deadlines rather than hard reasoning :) for the www.pagekit.org MVCC framework. The actual framework is very sound and I've come to appreciate both it's simplicity and power. It is also easy to write handles to serve pretty much any content. I also had a past experience serving images from the database.
However, what you are suggesting sounds very enticing. I will appreciate it if you send me links to some resources on the web where I can further delve into this subject. ;-)
_____________________
# Under Construction
| [reply] [Watch: Dir/Any] [d/l] |
|
package Apache::AuthAny;
# file: Apache/AuthAny.pm
use strict;
use Apache::Constants qw(:common);
sub handler {
my $r = shift;
my($res, $sent_pw) = $r->get_basic_auth_pw;
return $res if $res != OK;
my $user = $r->connection->user;
unless($user and $sent_pw) {
$r->note_basic_auth_failure;
$r->log_reason("Both a username and password must be provided
+", $r->filename);
return AUTH_REQUIRED;
}
return OK;
}
1;
(that'll authenticate on the *presence* of both a username and password, via HTTP Basic Auth - obviously you'd want to substitute a real-world authentication scheme).
The Eagle book gives full details, and some of it seems to be online here:
http://modperl.com:9000/book/chapters/ch6.html
(found through random Googling).
hth, andye.
| [reply] [Watch: Dir/Any] [d/l] |
|
Here I am :)
Chapter 6 from Eagle Book describes
what you need:
In this chapter, we step back to an earlier phase of the
HTTP transaction, one in which Apache attempts to determine
the identity of the person at the other end of the connection,
and whether he or she is authorized to access the resource.
Apache's APIs for authentication and authorization are
straightforward yet powerful. You can implement simple
password-based checking in just a few lines of code.
With somewhat more effort, you can implement more
sophisticated authentication systems, such as ones based on
hardware tokens.
You can find a copy of this chapter
here.
mod_perl Developer's Cookbook
provides some other examples on the same subject.
I understand your point about deadlines, I was talking
about theory, real life is another story ;-)
Good luck for your project. Ciao, Valerio
| [reply] [Watch: Dir/Any] |
Re: mod_perl web app design considerations
by abell (Chaplain) on Sep 04, 2002 at 09:37 UTC
|
I would opt for solution 4 :-)
4. Put images into the database together with "confirmed" images, with a flag set to 'uncofirmed'. If the images belong to a more complex structure which is being built through a sequence of forms, put the 'unconfirmed' flag (a boolean field) on this structure, and set it to 'confirmed' when the input process ends properly.
This way, images remain private and you can use the same routines you apply to regularly stored ("confirmed") images. You only need this extra boolean field in one database table and you need to check it when you perform queries on confirmed images (so add ' AND confirmed=TRUE' to all WHERE clauses in your queries on images).
Every now and then you can delete all unconfirmed images, based on their upload date (if you have it stored somewhere), on their ID or simply when no user is in session .
The drawback is you have to pay a little overhead for retrieving images from the DB at the following stages of the input process, but maybe this is what you already do when showing images to users.
Best regards
Antonio Bellezza
Update: minor language corrections | [reply] [Watch: Dir/Any] |
|
|