jmagiera has asked for the wisdom of the Perl Monks concerning the following question:

Hello Perl Monks,

I have written a rather large (~7000 LOC) CGI-based Webscript that I am currently restructuring. It is designed to run on as many platforms as possible, including Win32. Here is what I am changing:

1) modularization, by grouping subroutines and putting them into their own file, importing them with "require"
2) Writing my own extension to HTTP::Server::Simple to run the script within its own webserver. The server package runs the script using do <scriptfile>;.
3) Getting rid of global variables and putting them into a package where they are accessed using the package-name. That package is use()ed in every file.

Now to my concerns:
a) The script stores its data using Tie::Persistent which can take up quite a lot of memory during script execution. This wasn't really a problem before, because the perl process died with end of the CGI script execution. But now that the script is running "forever" in the same process with the server, not only am I discovering how much memory it takes up, it also seems to have a memory leak. I am guessing that the leak is related to the fact that the data hash is package-global, but I could use your input here.
I actually expected the memory to be freed again when the script is done (the do <scriptfile>; part), but aparently not. How do I explicitly free memory again (apart from untie and  %myhash = ();)

b) Ever since I introduced the package-global variables, my script is running slower, no matter if it's run as a CGI script or within its own webserver. I try to persuade myself I didn't change anything else besides that, so I hope it's correct to blame it on making the variables package-global.

So, I hope I was clear enough on all the points and thank you for your attention.

Jakob

Replies are listed 'Best First'.
Re: modularization, memory usage and performance
by Tanktalus (Canon) on Feb 09, 2005 at 00:38 UTC

    Not being an expert, I'd like to kick off this segment of our CGI-mangling program with one observation. Some people may disagree with my use of require. That's fine - we can just agree to disagree about that.

    However, I would like you to be aware of both sides of the coin as it's not clear to me that you should be using require in these circumstances. Specifically, it sounds like you're switching to something similar to mod_perl. If that is the case, require is actually the wrong thing to use. It is actually better to use "use My::Module qw()" than "require My::Module". This will cause My::Module to be loaded and compiled with the starting of the webserver, rather than in each child process that the server kicks off via fork().

    However, if you want your scripts to work both with mod_perl and regular CGI, you may have some trade-offs to make. Since not every CGI call may need all of your modules, using require makes sense - you only load/compile them if you need them. At hundreds or thousands of requests per minute (or second!), this can be significant.

    Make your choice wisely - each choice affects the other's performance negatively. A possible third choice is to filter your code during the build, and have your Makefile.PL or Build.PL take a parameter that says whether to build for mod_perl or mod_cgi. With mod_perl, you would just filter all of your code as it is being copied to blib with something like s/require/use/ - of course, you'll want a bit more smarts than that (since that will kill something in quotes - comments don't matter much). Maybe you have a keyword __REQUIRE which gets filtered to require for mod_cgi, and use for mod_perl.

      Thanks for your reply, Tanktalus. I think I need to clarify a bit more on the background of this script:

      It's designed as a "standalone-application", not a script running on a public webserver with 100s of requests. Therefore, the server also doesn't fork().

      I'd really appreciate input on the memory leak. If I just reload the same request from the browser, the memory usage keep growing, slowly but steadily with each request.

      Oh, I just had an enlightenment! The monks' powers are at work! The script creates a new CGI object every time it's run. Could this create memory leakage? When does perl collect garbage (if it does at all)?

        Why create a CGI object at all? Doesn't HTTP::Server::Simple give that to you when it calls handle_request?

        But to answer your question: garbage collection happens, as I understand it, when the last reference to an object disappears. As long as something refers to an object, it sticks around. This can be annoying, for example, when you've got a closure, so be careful. ;-)

        Also note that as you use the CGI module, it loads more of itself into memory and compiles more, and then it stays in case it is called a second time. So if each request causes you to call more parts of CGI.pm, that isn't a leak, it's just that you used more code, so more code had to be loaded and compiled. Avoiding using CGI any more than you have to (e.g., using templates for HTML rather than CGI's html code) may be a good idea here for many reasons, including this one.

Re: modularization, memory usage and performance
by mkirank (Chaplain) on Feb 09, 2005 at 07:41 UTC
    which can take up quite a lot of memory during script execution
    Why can't you use a Database DBD::SQLite
      I am not using a database and I don't want to, because the application is supposed to be easy to install and I want to limit the amount of 3rd party products involved (aiming towards all perl :D )

      Since Tie::Persistent seems to be buggy and was last maintained in 2002, I'll probably switch to SDBM_File or so.

      Perl help me.

        SQLite is a Self Contained RDBMS in a DBI Driver
        Installing DBD::SQLIte is just like Installing any other perl modules
Re: modularization, memory usage and performance
by dragonchild (Archbishop) on Feb 09, 2005 at 13:47 UTC
    You want to use CGI::Application. It would be very easy to have a subclass of C::A also inherit from or delegate to HTTP:::Server::Simple. Some benefits:
    • All your globals go into the base class as constants. So, now you can just do $self->GLOBAL_VAR and it will "Do The Right Thing"™.
    • Everything will be modularized because it will be put into an OO framework. If two areas need it, just hoist the code into the class they both inherit from.

    As for your Tie::Persistent thing ... go with a database. There's no reason not to use DBD::SQLite, seeing as PodMaster already has a version compiled for both ActivePerl 5.6 and ActivePerl 5.8.

    Now, to deal with your general architecture. I have no idea what you're doing, what the script is meant to do, or why. I do know that you're intending on distributing an application that creates a web server to people who don't know enough about programming to figure out how to install something.

    *blinks*

    This is going to create security holes ... I would strongly suggest that, if you intend on doing this, you provide the source code to some community, possibly Perlmonks, so that it can be peer-reviewed. Otherwise, you will be doing your users a disservice by providing them unreviewed software that can open their computers to malicious people.

    Being right, does not endow the right to be rude; politeness costs nothing.
    Being unknowing, is not the same as being stupid.
    Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
    Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

      CGI::Application sounds very promising, thank you for that.
      I think it's time for some more revelation about what I am doing so as to avoid misunderstanding. The application actually has its own website, but it's in German, and as this site's primary language is English I thought I wouldn't bother linking. But anyway: http://www.auctober.de
      Just to encourage you: I keep all code commented in English. :D

      Auctober is a freeware tool for sellers on eBay Germany. I know there are many of such tools, but I think I have a few unique goals:
      1) platform independence
      2) no presentation layer (Webbrowser is the GUI)
      3) very small (once perl is installed)

      At the beginning I made the tool for myself, but then I just got a kick out of publishing it, at first for "advanced users" who knew how to install a webserver or had one. Then I included Xitami for Windows-Users, which I shipped with the application. Now I want to get rid of Xitami and ship my own 20K webserver :D

      I hear your security concerns and I am aware of them. I am thinking of including at least an HTTP basic authentication in the webserver (should be easy with perl, right?). Of course I am also making sure that the requests can't access other documents besides the ones needed for the application (like /etc/passwd or so).

      I am considering using DBD::SQLIte2, but the compiled version (Windows) is 1.2 MB, and that's almost as big as my whole distribution at the moment. If possible I would like to avoid that. Tie::Persistent worked really well for me and maybe I can modify it so it doesn't keep references.

      -----------
      Perl help me.
Re: modularization, memory usage and performance
by perrin (Chancellor) on Feb 09, 2005 at 17:58 UTC
    Using globals will not slow down your program. In fact, you were using globabls before, but they were in the "main::" package because you didn't put them anywhere else. Something else is causing your slowdown.

    You might be better off using one of the dbm files, like DB_File or GDBM_File, than Tie::Persistent. The trouble with SDBM_File is that it has very small size limitations.

    It also sounds like you are creating libraries that have no package declaration and just pulling all of their functions into your main program. If that's what you're doing, I would advise you to put them in a separate package instead, to avoid confusion and namespace collisions.

      Thanks for the input, perrin. DB_File and GDBM_File sound very promising. I am always a bit hesitant about packages that come with C sources because I want don't want my users to have to compile something, but I hope to find a way to precompile those packages for Windows (because most people don't have a compiler on Windows).

      As for creating libraries with no package: you are right, each "library" is just a collection of subs without any namespace (well, main::). I'll work on that.

      -----------
      Perl help me.
        There are some dbm modules that come with ActiveState Perl on Windows, so they don't need to be compiled or installed separately. Check which ones they are and use one of them.