Is it (possible|wise|advisable) to write a "serious" http server in Perl?

Why would I ask? Well, if you have ever managed IIS before then you know that it has the ability to stop/start individual websites without stopping/starting all other websites in the process.

I was disappointed with Apache's httpd v2 when it was released without that feature. Also, after seeing the architecture of IIS7, Apache httpd v2 is already showing some age.

Some other things I would like to see in a "modern" http server include: Writing something with these features should be fairly trivial given Perl's awesome flexibility. However, the apparent lack of existing Perl http servers (barring *::Simple or *::Lite servers on CPAN) is enough to make me wonder - "Do others know something about this that I don't?"

So - is it time to write a "serious" http server in Perl?
  • Comment on Time to write a "serious" http server in Perl?

Replies are listed 'Best First'.
Re: Time to write a "serious" http server in Perl?
by perrin (Chancellor) on Aug 10, 2008 at 19:16 UTC
    First, writing a serious web server is not trivial in any language. It's a complex network server that requires intense attention to security (witness the number of IIS exploits). Not something to be taken lightly.

    Second, I think you're underestimating the power and flexibility of Apache2. Let's look at some of your shopping list:

    Built-in distributed application- and session-state.
    This is application and language specific. Web servers have no business getting into this.
    The ability to use different processes for different websites.
    Apache2's model allows any configuration of processes and threads that you like with MPM modules.
    The ability to specify resource limits on a per-website basis.
    Check out some of the bandwidth and throttling modules available for Apache2. They can do this.
    Cluster-awareness. Load-balancing and application clustering should not be "bolted-on" as an afterthought. These features are important enough to be included as part of the overall system architecture.
    Again, I would say this has nothing to do with the web server itself and should not be tied to it, but there are load-balancing modules for Apache2 if you insist. The mod_proxy stuff is one option.
    A "mini" version capable of running a single website - especially useful for local development and testing.
    Apache2 works fine for this. You can simply start your own instance with your own config file on a specific IP and port. In fact, unlike IIS, you can install dozens of dev Apache2 servers with different compile options on a single machine and start, stop, and configure them all separately. It's total freedom.
    Like FCGI, the ability to pass "requests" to another process and return "responses." This would presumably be simpler than requiring a compiled binary (*.so) as a module.
    What's wrong with using FCGI and mod_proxy for this, as we have for years?

    You also mentioned starting/stopping a single virtual host. This is pretty simple with FastCGI, by restarting the FastCGI processes for that site. If you use mod_perl and want to do this, you would run a separate backend instance for each site. Running a proxy in front of mod_perl has long been the recommended configuration.

      First, writing a serious web server is not trivial in any language. It's a complex network server that requires intense attention to security (witness the number of IIS exploits). Not something to be taken lightly.
      Absolutely - however, it would appear that a large number of those exploits are the result of IIS being too tightly-coupled with the Windows OS, and written in C/C++.

      Were IIS written in managed C#, at least some of those problems may not have been introduced. Of course, you could argue that "fixing" such problems with .Net just gives you another problem :)
      Second, I think you're underestimating the power and flexibility of Apache2. Let's look at some of your shopping list:

      ...lots of ++excellent++ points...
      Having used Apache httpd since 1999 I have learned the ropes (enough to get my job done). However, I have always felt that Apache httpd configuration and deployment has always been something of an esoteric pile of "this is just the way it's done" dogma.

      Why does one need to understand reverse proxies, ports, networking, system resource balancing, etc, etc, etc - just to set up 2 VirtualHosts running completely independent of one another, so that you could stop/start one and not the other? If I don't have to manage memory and garbage-collection within Perl, why do I have to be a network- and system-administration- wizard to set up something fairly trivial like that in Apache httpd? It's not as if running more than one VirtualHost on a single machine is an uncommon thing that only expert-level users would do.

      Running a proxy in front of mod_perl has long been the recommended configuration.
      And yet I have never seen an instance of Apache httpd installed (or set up by default) with mod_perl behind a proxy. Not since Redhat 5, not on Ubuntu 8.04, not on Fedora 9, CentOS or RHEL 5. Not with mod_perl 1.3x or with mod_perl 2.x. Not ever, not even once.

      Apache httpd has been around for a long time, and will probably be around a another long time or two. If it works for you, then keep doing it. It doesn't work for me anymore, especially with Microsoft's recent donations (and the implied (or my inferred)) influence over what happens with all of the Apache Software Foundation's projects. For all I know the money could be Microsoft's way of saying "Sit. Stay. Good boy."

      Back to my original question - Are Perl's networking libraries are capable of serving millions of requests per day, reliably and without leaking memory or causing other problems? Is it possible? Is it advisable?

      Thanks for sharing your well-seasoned opinion!
        Were IIS written in managed C#, at least some of those problems may not have been introduced.

        True, but the close integration to the OS is the only way they were able to achieve the performance numbers they have and add features like session storage. If you try to write an IIS application in a language that Microsoft doesn't support, I doubt you'll be able to take advantage of features like that. They made a trade-off.

        Why does one need to understand reverse proxies, ports, networking, system resource balancing, etc, etc, etc - just to set up 2 VirtualHosts running completely independent of one another, so that you could stop/start one and not the other?

        It's another trade-off. Rather than limit you to doing what the original developers thought of, Apache provides a very flexible solution which can be used in a variety of ways. That requires users to know something about the problem domain. Hiding those details would limit what you can do with the server.

        And yet I have never seen an instance of Apache httpd installed (or set up by default) with mod_perl behind a proxy. Not since Redhat 5, not on Ubuntu 8.04, not on Fedora 9, CentOS or RHEL 5. Not with mod_perl 1.3x or with mod_perl 2.x. Not ever, not even once.

        You're talking about default packages shipped by OS vendors? No one running a serious site on mod_perl would use those. I've never seen a mod_perl site getting real traffic that runs without a proxy or uses vendor binaries.

        It doesn't work for me anymore, especially with Microsoft's recent donations (and the implied (or my inferred)) influence over what happens with all of the Apache Software Foundation's projects. For all I know the money could be Microsoft's way of saying "Sit. Stay. Good boy."

        Ok, this is just FUD and it's offensive. Although I don't work on the httpd project, I am a member of the ASF (through work on mod_perl) and I can assure you that I don't have any plans to change my behavior due to Microsoft's contribution.

        If your problem with Apache2 is that you don't like the way you configure it, try another server. There are literally hundreds of open source web servers out there, including some serious ones in Perl. There's no reason to start from scratch. Maybe you'll like lighttpd or AxKit2 or perlbal or one of the many others.

        If Apache doesn't "work" for you, what does? If Microsoft influence is a negative, why would you prefer IIS? As for influence by donation, I think you're confusing volunteer programmers with professional politicians. The former aren't generally so easily corrupted.

        sas
        All you need to understand to have two completely different Apache instances serving two completely separate web sites are any one of: a reverse proxy, binding to a single IP address, virtual servers, mod_rewrite, port forwarding, or having two servers on your network. If your server administrator doesn't understand at least one of those technologies, you need to fire your server administrator. Not all of those are ideal, but any one would get the job done.

        The reason you don't see Apache setup with a proxy out front by a distro, by default, is very simple. Most people don't need it, it would be confusing to them if they ran across it, and those of us who do need it know it's trivially simple to setup.

        Apache isn't setup "by default" to do lots of the things I use it for regularly, but it's flexibility allows me to setup pretty much whatever I want, however I want, very easily.

        Frank Wiles <frank@revsys.com>
        www.revsys.com

      Running a proxy in front of mod_perl has long been the recommended configuration.

      Firstly, I hate mod_perl; and, I don't use Apache. Though for a long time I used both of them. With that said, mod_perl is not recommended by anyone who uses it and knows of its "alternatives." Again: If you use mod_perl it is not a recommended solution.. It is the *only* solution. There isn't a single non-contrived use case of mod_perl where fastcgi can be used, in which FastCGI shouldn't be used. FastCGI is a platform agnostic way to write web pages with your favorite scripting language. It requires no linking, and offers a clear separation of the web-server logic, and the web site logic. Conversely, mod_perl is a way to write Apache modules in perl.. These are two totally different tasks. Instructing people to write web sites using mod_perl, is akin to instructing them to write gui's in kernel space using frame buffers.



      Evan Carroll
      I hack for the ladies.
      www.EvanCarroll.com
        No matter your dynamic page toolkit, whether it's mod_perl, FastCGI, plain CGI, or some fully custom web server written in whatever language, a reverse proxy that caches dynamic pages to static ones will accomplish the same thing. The load of generating the same dynamic page over and over on a highly trafficked server will be reduced to one dynamic request and a bunch of static ones, with a new dynamic page generated only every so often.
        A couple years ago I would have flamed you. Now I am in agreement.

        It seems to me that mod_perl is a great way to hack out something slick - i.e. when mod_rewrite doesn't quite cut it, use a PerlTransHandler to change $r->uri() to another value based on what your database says (or whatever).

        Another case of "Just because you can do X doesn't mean you should do X."

        ...I suppose you could apply the same argument to writing an http server in Perl.
Re: Time to write a "serious" http server in Perl?
by tilly (Archbishop) on Aug 10, 2008 at 21:46 UTC
    No.

    Perl is an easy language to write a prototype in, but is a horrible language for writing a serious webserver. If you try to do it you'll be forced down one of a few basic approaches, and all are bad ideas in Perl:

    1. Simple single-threaded: This is what most of those *::Simple and *::Lite servers do. This approach can't serve two requests at once.
    2. Forking: A simple CRUD application will have to connect to the database on each request. When you're under volume, most databases will not survive. This approach is available in some of the pure Perl webservers.
    3. Pre-fork: This is how Apache used to work by default, and it can still be made to do so. (Smart mod_perl shops tend to use this approach combined with a reverse proxy.) The idea is that you start out with a number of children, and one of them serves the request. So connections get reused across requests. Unfortunately Perl processes use too much memory to have large numbers of them. This is the primary reason why serious mod_perl sites use a reverse proxy configuration. You just can't afford to tie up a ton of memory in a process whose job is to dribble bytes at whatever rate the client's dialup can accept it.
    4. Threading: My opinion is that Perl threading has all of the disadvantages of threading and none of the advantages. Others don't agree with me, but still even its advocates would not try to use it to scale to a busy website.
    5. Asynchronous programming: This is a promising approach until you look at how much of the infrastructure people use Perl for is not asynchronous. For example database access is done with DBI, which is synchronous. So your first long-running query takes your website down. Not good.
    Those are your basic options. None of them would be a good choice to handle serious volume. Oh, you might be able to make them work, but at what cost in hardware? Why bother when Apache does the same thing with fewer resources?
      Pre-fork: [...] Unfortunately Perl processes use too much memory to have large numbers of them.

      Could you explain why this is a problem? At least on the operating systems that I have worked with, forked processes continue to use their parent's memory pages until they need to write to them (aka "copy-on-write"). So if the parent (master) process is designed to load all the big modules and stuff, the children should produce little overhead in memory.

        People do this. Unfortunately due to Perl's reference counting, in normal use you wind up writing to data that you're just accessing. So while you can preload lots of stuff, and it starts off shared, it doesn't stay shared.

        Again, this is not a theoretical problem. It is a real problem that people at high-volume mod_perl sites have been dealing with for years. And the solution is standard as well.

      What is a reverse proxy?
        Usually proxies handle outgoing requests from a network. A reverse proxy handles incoming requests to a network.

        In this context, a reverse proxy would read the request from a client, ask another Apache httpd instance for the data, then return that data to the client.

        This way, the mod_perl server can finish the request quickly and doesn't bog down a process (with a presumably large chunk of RAM) with downloading the response bytes to the client (on a slow connection).

Re: Time to write a "serious" http server in Perl?
by dragonchild (Archbishop) on Aug 11, 2008 at 02:23 UTC
    You've mentioned two webservers - Apache and IIS. The Ruby folk don't use either - they use mongrel+lighttpd and seem to do just fine. There's at least 3 other webservers that probably have some decent work behind them.

    There's also yaws, written in Erlang. Early benchmarking seems to indicate that yaws can handle something on the order of 40,000 simultaneous connections per machine compared to Apache's 8000. Having been written in Erlang, it is pretty simple to have it take advantage of multiple machines.

    Furthermore, it sounds like you're starting to go beyond the simple stuff that most webdevs can do and get into the kind of stuff an expert is really needed for. Kinda like how most devs can configure a basic RDBMS (like MySQL, PG, or Oracle), but really need a DBA in order to take full advantage of the relational calculus, how it's implemented in the chosen server, and how to properly configure it for real-world use. This isn't a slam - different people specialize for different things. I wouldn't hire any of @Larry to configure a RDBMS, but that doesn't take away from any of their real skills. Same thing goes for any portion of your service's delivery stack.


    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
Re: Time to write a "serious" http server in Perl?
by EvanK (Chaplain) on Aug 10, 2008 at 21:03 UTC
    Just as Perl is the swiss army chainsaw of languages, Apache is the swiss army chainsaw of webservers. You can do just about anything with it, but as a result it's also very easy to cut off your own feet.

    IIS is more narrowly defined in its capabilities, but it serves its narrower purposes very very well (so I hear). It's more analogous to Visual Basic: very easy to use and certainly capable, but with many restrictions on the more fancy complex stuff.

    Just my $0.02

    __________
    Systems development is like banging your head against a wall...
    It's usually very painful, but if you're persistent, you'll get through it.

Re: Time to write a "serious" http server in Perl?
by talexb (Chancellor) on Aug 11, 2008 at 02:33 UTC
      A "mini" version capable of running a single website - especially useful for local development and testing.

    I can highly recommend lighttpd for this request -- it's fantastic.

    Alex / talexb / Toronto

    "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

Re: Time to write a "serious" http server in Perl?
by wol (Hermit) on Aug 11, 2008 at 13:49 UTC
    If a serious HTTP server isn't realistic, how about a comedy one?

    Rather than using TCP as a transport, we can use cars driven by clowns, with doors that fall off to provide tracibility and per-gig persistant state. HTTP 401 errors can be implemented with custard pies. For scalability, a double act is a proven solution.

    According to the needs of the venue, the service can be run in slapstick mode or pun based.

    Sorry, I seem to have wandered a little off topic. Could anyone front me a dried frog pill or two?

Re: Time to write a "serious" http server in Perl?
by ysth (Canon) on Aug 11, 2008 at 00:52 UTC
    Why would I ask? Well, if you have ever managed IIS before then you know that it has the ability to stop/start individual websites without stopping/starting all other websites in the process.
    How what IIS does different from a graceful reload?

      My very first thought, odd that it appeared so far down the list! Further more, combined with the sites-enabled structure it becomes trivial to turn of a single virtual host for extended periods. I think perhaps the OP is not as familiar with apache as some would have us think.

      Confucius says kill mosquito unless cannon
      A graceful reload won't actually restart the perl interpreter in mod_perl. It would work fine for CGI of course.
Re: Time to write a "serious" http server in Perl?
by JavaFan (Canon) on Aug 11, 2008 at 08:48 UTC
    So - is it time to write a "serious" http server in Perl?

    Writing an HTTP server in Perl doesn't magically solve any of the problem you perceive Apache 2 has.

    Personally, if I wanted to write a small webserver that does a very specific thing, and it wouldn't have to handle a huge load, I'd write it in Perl. If I were to write a general purpose webserver having to handle a huge load, rival Apache2 and even implement more, the only language I'd consider using is C.

      Perhaps writing another niche-focus http server would be the way to go.

      An http server that does what I want, and nothing more. I suppose only testing will show me the weaknesses in any approach I choose to take with this.
      Have you considered Erlang - yaws is written in it and seems to have very interesting initial performance statistics. While not as fast as C, it scales much better than C ever possibly could.

      My criteria for good software:
      1. Does it work?
      2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
        Considering that most (Unix) OSses are written in C, it would be irrelevant for a language to scale better than C ever could. Because if such a statement is true, it wouldn't help whatever application was written in Erlang, it couldn't display its superiour scalability - the limitations of the OS kick in.
      the only language I'd consider using is C
      Is that because you don't know C++ or even D?
        All I know about D is that it isn't supported widely enough to consider it for a large scale project that should run on a large number of platforms. (I assume you are talking about D and not D or D.)

        And I know enough C++ to not consider it.

Re: Time to write a "serious" http server in Perl?
by blazar (Canon) on Aug 13, 2008 at 21:21 UTC

    I personally believe that you may be amazed by Continuity! (Ok, not intended to be a "real" solution to your problem, but certainly worth looking at it if you've never done...)

    --
    If you can't understand the incipit, then please check the IPB Campaign.
      I've looked at Continuity before.

      While writing Apache2::ASP it became apparent to me why ASP.Net went with an event-based model (which Continuity reminds me of). Of course event-based models depend on a proper server-side DOM. Server-side DOM (such as ASP.Net has) is complicated and does not seem to be very popular.

      Any project of this kind is, as perrin put it, "not to be taken lightly." Along similar lines, I would also like to see a Perl SMTP server (that also does not depend on Apache httpd). A topic for another thread I suppose.
        Along similar lines, I would also like to see a Perl SMTP server (that also does not depend on Apache httpd).

        Look at qpsmtpd.

Re: Time to write a "serious" http server in Perl?
by mr_mischief (Monsignor) on Aug 12, 2008 at 20:02 UTC
    One thing worth considering is that Perl6 might be a more suitable language for such a project than Perl5. The memory usage of some of the Perl6 implementations is supposed to be greatly improved, and you'll be able to specify native variable types where necessary.
      I agree - perhaps a Perl5 prototype would be interesting to kick the tires.

      A rewrite in Perl6 to take advantage of the new features and optimizations made there would seal the deal.

      I would write it using Moose now but the performance hit wouldn't be worth it.
        I would write it using Moose now but the performance hit wouldn't be worth it.

        Actually Moose should perform quite well for something like a HTTP server since the Moose performance hit is largely at compile time and if you utilize the proper optimizations in Moose the runtime can be pretty fast (simple Moose accessors are faster then Class::Accessor).

        -stvn