Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I'm going to be a setting up a site with several million pages. Of course, I am going to be using Perl to power it all. I wondered if there are any readymade solutions out there that allow:
- Content management by inexperienced users
- Web and shell-based content management
- Template-based design

After seeing how cool Perlmonks.org is, and seeing that it is powered by Everything, I thought I'd give it a look. Unfortunately though, it doesn't really seem to be suitable for me. It's high database dependent, and my site is going to be largely composed of static pages 'built' from templates. Any dynamic elements will be added by SSI. Furthermore, Everything seems to generate all pages dynamically, and I just don't have the hardware to support that kind of load. It also uses question marks in the URLs (read: no spidering).

Is there any readymade product out there, or am I just going to have to do it all the hard way?

Replies are listed 'Best First'.
Re: Perl solutions for large web sites?
by jjhorner (Hermit) on May 23, 2000 at 16:31 UTC

    I'm a bit confused. You need static pages, because you can't handle the load, but you want perl to power it all?

    Your low load option is to just use static pages. Very little processor load, because serving static pages it the easiest method of web serving.

    Your medium load option is to use a mod_perl backend to write your own SSI scheme (A la Douglas M. and Lincoln S. in "Building Apache Modules in Perl and C"). This will give you added functionality (custom SSI), and tight integration with a real web server (Apache). The custom SSI can do as much as you can code.

    Your high load option is to just use mod_perl to add headers, footers and menus to your pages on the fly, and just let maintainers worry about content (making your pages into an enormous table with one row as a header, one td of the second row a menu, a second as the main body of the page, and another row as a footer is most common).

    Spiders can index results from CGIs, but can't generate the parameters passed to CGIs that you may want.

    If you have more questions, /msg me.

    JJ

    J. J. Horner

    Linux, Perl, Apache, Stronghold, Unix

    jhorner@knoxlug.org http://www.knoxlug.org

      I'm a bit confused. You need static pages, because you can't handle the load, but you want perl to power it all?
      Yes, as in Perl will generate those static pages once in a while, from templates that I create. Therefore, Perl will be powering it.

      Your low load option is to just use static pages. Very little processor load, because serving static pages it the easiest method of web serving.
      Exactly, is that I'm doing it with a twist. Static pages will be generated by Perl scripts.

      Spiders can index results from CGIs, but can't generate the parameters passed to CGIs that you may want.
      Either I'm misunderstanding you, or you're simply incorrect. Search engines won't spider:
      http://www.foo.com/foo.cgi?whatever=300 but they would spider:
      http://www.foo.com/foo.cgi

RE: Perl solutions for large web sites?
by swiftone (Curate) on May 23, 2000 at 18:46 UTC
    I'm working on a set of tools to do something very similar to the site where I work. It's a very good solution for a site with much static content.

    (You can find my earlier comment here)

    Basically I use Text::Template to seperate my content and my navigation. A database holds page page-child relationships, and a few scripts let me easily add pages. A build script writes flat HTML files from the templates, edit the content in the templates, or view what a template will look like without writing the HTML file. My interface person can tweak the (mostly) HTML templates to change the navigation for the entire site without learning any Perl.

Using a question mark is not necessary for CGI
by Corion (Patriarch) on May 23, 2000 at 18:59 UTC
    It's not necessary (at least with Apache and Xitami) to have "?" as the CGI parameter delimiter - for example some Wiki clones (Wiki, Twiki, search Freshmeat for it) use some feature of Apache that allows to insert the script in the middle of the url and the rest of the url path passed as the first parameter (like http://www.nowhere.com/view.pl/01/0101/ ). This would trick spiders into crawling your site, but maybe you should consider making a direct submission/update to spiders instead of having them crawling your site. I don't know if this is actually possible, but ebay.de shows up quite up-to-date on google.com.
      The "Trick" in apache is this: Put a perl script in the main directory with a name like "graph" Then in the apache conf do: <Location /graph> SetHandler cgi-script </Location> (You may also have to set "Options ExecCGI" on either the location or the directory it is in) In your perl script, using "use CGI;" of course you can get the path from your $cgi variable like this: use CGI qw(:standard); my $cgi = new CGI; my $path = $cgi->path_info(); If you went to the site "http://x.org/graph/blue/green/fun.gif" then $path would be set to "/blue/green/fun.gif" Now you can split on / and get your args. If you are getting you data from a DB and want cache's out in the world to cache data that rarely changes, send the proper "Expires: ..." header or the "Last-Modified: ..." header. Building your code inline like this and adding the extra headers all you need to do for the caches. I built a cache in front of a site like this and it saved me about 60% of the load on my processor.
RE: Perl solutions for large web sites?
by turnstep (Parson) on May 23, 2000 at 19:24 UTC
    I think you misspelt "thousands" as "millions." If your hardware cannot handle dynamic pages, how is it going to handle millions of static pages? 3 million pages times, let's say, 18000 (the current size of this page) is about 50,000 Megs of data.

      You've got a point.

      A lot of this page is dynamic stuff which relies on CPU power. Trying to do a page like this with static pages would take a long, long time. I would bet that the code for this site is actually really small, but with a large database.

      He could set up some scripts to generate stuff from templates every 10 minutes. This would give him the reduced overhead of not generating on the fly, but would give the illusion of dynamic generation, but only really work a few minutes out of every 10. That would be a good mid-grade solution, I think.

      JJ

        You can set up a caching proxy server as the front end, and have it cache your dynamic as well as your static content. This does require setting correct headers. There are more details in the apache mod_perl guide, check out http://perl.apache.org/guide/

        Also you can stack apache handlers so that all your HTML documents get sent through a ssi like process, that wraps the headers and menus around it. That is what I am doing for makeyourbanner.com, because I want to escape the mess that I made by implementing the whole thing with CGI.pm. CGI.pm is a WONDERFUL tool, but I'm not sure about the wisdom of writing large HTML with it; it's a mess, because you have to visualize both the HTML and the perl at the same time. With a wrapper, you get the same effect, only writing the menu once, but you don't have think about more than one language at a time. Very handy if different people are handling different parts of the site...

Re: Perl solutions for large web sites?
by Giskard (Initiate) on May 23, 2000 at 20:31 UTC
    This doesnt really fall in the "low power" category but does anyone else here use the (Perl powered - in part) Mediasurface content management system?