leocharre has asked for the wisdom of the Perl Monks concerning the following question:

i'm working on an online document access and document management system. the system needs to be able to handle a lot of queries to a database- as i am not querying the filesystem for this info - (sometimes files may not be there, etc .. mounted.. etc etc )

to the logged in user, they see what looks like to them like a filesystem hierarchy ( really a graphical representation of a filetree slice they have been granted some level of access to )

when a user is in a part of their allowed hierarchy, i query a db for info on the files we will be presently rendering.. the file name, size, filesystem it resides on, md5sum, etc etc.

the output to user may require again to render the same data of such files, and also, other users may request the same data on the same files.

so, i was thinking..
how about this..
what if i have a 'daemon' sort of script.. and it stores the most recent db request resutls in memory - so.. if a user requests to see info on file id 234, first we ask 'daemon' if say..
%{$main::RECENT_FILES{234}} is available ? and if not.. then 'daemon' eats it from db.. etc.. ,
i could store the 1000 most recently results, and results on other stuff. if data is older then x, then delete.. etc.

would this really give me some edge in doing queries for thousands of files, or should i instead perhaps simply maintain some sort of one db connection open for all users to query, something like that. as is, i am opening a connection for each user logged in.

it seems that keeping this data in memmory would be quicker. how would i go about doing this- where should i look, is oneof my questions- doing a sort of daemon thingie.. a perl script that stays alive and how on earth would i access it's symbol table/ namespace .. ?

  • Comment on can a perl script act as a daemon to serve data in its symbol table?

Replies are listed 'Best First'.
Re: can a perl script act as a daemon to serve data in its symbol table?
by eric256 (Parson) on Jan 31, 2006 at 22:22 UTC

    Perl mantra, do it first, if its too slow THEN look for a better solution.

    That said, it will be easier and probably more efficient to look into something like mod_perl plus the built in connection caching of DBI. The two together can cause great speed improvments with virtualy no code changes. ;) Any solution that means running your perl script once and then having the webserver access it (like mod_perl does) will greatly help your speed since each hit doesn't have to wait for perl to start up. Just my 2 cents.


    ___________
    Eric Hodges
Re: can a perl script act as a daemon to serve data in its symbol table?
by samtregar (Abbot) on Jan 31, 2006 at 22:20 UTC
    You seem to be reinventing the concept of a cache daemon. Perhaps you could use memcached? It has a nice Perl interface and is very fast.

    -sam

      this is good.. real good, you saved me, sam-

      I still want to know though .. as a sidenote.. just to know.. how would one... if i have foo.pl running and waiting on the box. how could bar.pl access foo.pl's present namespace ? can it ?

      my $name = &foo.pl::main::Freaky($h1+3); # muhahhahaha
        Well, sure, that's more-or-less exactly what memcached is doing. You could write memcached in Perl and call it foo.pl if you wanted. I think you might be focusing too much on Perl's namespace. Perl namespaces are just hashes, just the same as the hash that memcached supports, so all the same techniques apply.

        If you want to learn more about networking with Perl, I suggest you pick up a copy of Network Programming with Perl. I'm sure it has examples of the kind of thing you're talking about.

        -sam

Re: can a perl script act as a daemon to serve data in its symbol table?
by tirwhan (Abbot) on Jan 31, 2006 at 22:30 UTC

    I'm not entirely sure I understand what you're trying to do here, but it seems like you're trying very hard to recreate a filesystem :-).

    If you've got a decent database system and OS as well as enough RAM, data for the most recent queries will likely still be in the OS's disk cache, which is in RAM. So access to that through the database will not be horrenduously slower than access to a cached version in memory. On the other hand, the overhead of managing your cached data, making sure it's still current, searching through it and expiring it sensibly will likely be quite considerable and could very well use up more system time than you gain (as well as being a good place for subtle bugs to nest). I'm not saying such a solution is generally bad, it could be very sensible if done well and make your application perform better. But it sounds as if you're optimising prematurely. You should probably finish your application, properly abstracting your data access so that you can add caching later, then profile and optimise at that stage.

    All that being said, take a look at Memoize, it may be a quick and simple way of doing what you want. And for reusing opened DBI connections, look at Apache::DBI.


    There are ten types of people: those that understand binary and those that don't.

      the more i think about it.. yes.. you're right and i feel like a dummy about it. .i *am* trying to recreate a filesystem! This whole thing lead me to get into filesystem design.. shucks, you'd imagine that would be enought to say 'hey, what the f**ck do you think you're doing!"

      Initially I was considering simply using the filesystem- gids, actualy accounts on the box (linux), etc . They could log in via a web interface, ftp, whatever. Thing is, this idea freaked out my sysadmin. There will be anything from hundreds to thousands of users per implementation (yes this will be opensourced, thus grinded, remade, and cleansed by the hands of.. uh.. ) -

      This system will serve sensitive data. Having this layer of abstraction pretty much posing as a filesystem, would help that tainted data is of the least sensitive nature...

      To tell the server what file they want to see info on, the tainted data is just an id for a file. The number itself means nothing to the os. It's not a path, not an inode, etc.

      The idea to further the app before optimizing is very helpful, thank you so much. it makes sense to me. it really helped me from going insane. further, that is.

        Don't worry, we've all been there :-). But if you're recreating a filesystem, a better solution may be to use the existing one. As I understand your post, you will now be limiting access to the files to be only via the webserver, right? So you don't have to worry about access at the filesystem level, you just need to make it simple enough for your web server to only ever serve the correct files to the client (this does break down if someone manages to compromise your webserver, but that's a hard problem to solve).

        So when a user logs in, get the access data for that user from the database and create a new directory in the webtree containing sym- or hard links to the files that he has access rights to. Have the server only serve files for that user from this directory and delete it after the user logs out. This solution may not be so good if you've got umpteen-thousand files you're serving to each user, but otherwise it seems like the simplest and most efficient solution to your requirements.


        There are ten types of people: those that understand binary and those that don't.