in reply to redesign everything engine?

might it be a good idea to start from scratch with a new engine, with in the beginning far less features, but FAST?

No. The Everything Engine had been under development for around two years by the time Perl Monks came about. That was over three years ago. In my opinion, throwing away five years of development, when you have a working product, is a terrible idea.

no more eval()

It was a deliberate design decision that administrators should be able to create nodemethods that exist only in the database. It would be nice to have the option to specify that you're not using that feature for a big performance gain (and I'm working on that), but forbidding it altogether would be removing one of the most important features of Everything.

implement some smart caching between application and database

This is a much better idea. It's also a hard problem, with Apache and mod_perl. Threaded Apache 2.0 may help immensely. I'm also experimenting with a standalone forking server that has much more control over shared memory. Improving the existing node cache may save a bunch of time.

I doubt you have profiling data, though, so I feel at liberty to say rewriting the engine from scratch is a terrible, awful, lousy idea. Just this weekend I checked in code that improved the second-biggest timesink subroutine by 50% -- without removing a single feature.

Update: I forgot to bring up the idea of data migration. Good luck.

Replies are listed 'Best First'.
Re: Re: redesign everything engine?
by perrin (Chancellor) on Jan 28, 2003 at 20:44 UTC
    The speed issues may be fixable without a rewrite, but frankly the Everything code is not very well suited to its current use on PerlMonks. It has some fundamental design decisions (keeping the code in the database, doing most of the data as generic blobs of XML) which are a major cause of slowness and race conditions, but more importantly make it really hard for most people to contribute to the code and make testing nearly impossible. How could anyone profile this code effectively? Just retrieving it to run on your own system requires significant work (because you have to get it out of the PerlMonks database and test data is not available).

    It's very cool that the system was designed flexibly enough to work in this way, but a more focused codebase that works specifically for PerlMonks would be able to run much more efficiently. PerlMonks is essentially a separate codebase now, since it branched off the Everything codebase a long time ago and is not able to take updates from that code unless someone manually merges them in.

    I would like to believe that a gradual process of rewriting could fix these issues, but I'm not sure it will because the things that need to be changed are so fundamental to the current design. Your point about all the accumulated knowledge in this code is a very good one though, and not to be dismissed lightly. Rewriting would be a lot of work and it would be hard to get all of the current functionality right. Migrating the data would be REALLY hard.

    Caching with mod_perl, on the other hand, is trivial. I gave a talk about it at OSCON last year and I'd be happy to help if you have questions about it. Tye was concerned that using shared caching anywhere other than the nodelets would make the race conditions worse, but doing the caching itself is simple. (Of course caching across a cluster is hard, but that has nothing to do with mod_perl and may not be required.)

      I profiled the current CVS in single-user mode on my laptop this weekend. I'm not really concerned about any one specific site, just the framework and general behavior. If I can speed that up, I'll have met my goal.

      I'm not terribly concerned about the XML, though using XML::DOM is a performance killer. That's mostly during the installation, though, so it's a low priority along the performance axis. The only place it's really used internally in the live system is in the workspacing code, and I don't think there's any of that on Perl Monks at the moment.

      Caching is complicated by the fact that the current CVS has subrefs in it. That's why I'm betting my managed-forking approach will have better performance in certain circumstances.

      The performance killers, as I see them:

      • pages are optimized for writing -- parsing links every time, processing page templates on every hit. This is ameliorated somewhat by code caching in the 1.0 series
      • nodelets are cached for the whole site or not at all -- they could be cached per user for a speed improvement
      • nodes have a custom inheritance scheme to deal with nodemethods, which was between 10 and 20% of the profiled time in my tests -- this could be reduced further
      • inefficient database queries, fetching hashrefs when bound scalars and explicit column names are 20-50% faster -- I'm working on this
      • inefficient code -- we're better programmers now

      I'm working on all of these, but it's at the tip of CVS. My goal is to make migrating to Everything 1.2 an attractive option for Perl Monks.

        By grabbing the latest Everything from CVS, you're kind of high-lighting the problem here: there is no current CVS of PerlMonks, because the code is kept in the database. There is no convenient way to get the latest code, let alone branch it for a major revision. It also makes the task of incorporating updates from Everything that much harder. This is why I think storing code in the database is not a good idea at this point. I'm sure there were reasons for it at the time, but it is counter-productive now.

        When I referred to XML, what I was really thinking of was the way nodes are stored and the resulting update problems (some of them are described here). I don't think this would be such a problem with a more normalized database schema and a codebase that allowed for finer-grained locking during updates.

        About the cache: subrefs are okay as long as Storable can handle them. Objects that can't be serialized can't be cached between processes at this point. At the moment, Perl threads are not very good at sharing objects so mod_perl 2 may not solve this issue any time soon. I'm not sure what your managed-forking idea is, but I don't see why it wouldn't have to deal with exactly the same issues mod_perl does.

        I don't want to sound like I'm just whining about the code. I am grateful for the existence of this site and your part in creating the code that made it happen. I do think that some of the design ideas have not scaled well though, and that it will be hard to fix it completely without fundamental changes.

        I would love to help out on the code (as might more people on perlmonks) but i don't want to download and install apache, mod_perl, MySQL and everything.

        Could it be a good idea if you, chromatic posted pieces of everything code for us monks to review? I am sure we can come up with some improvements which you could decide (not to) implement on that piece of code.

        carefull readers might understand by now that i would do anything to make this site faster, except really delve into the everything engine ;-)

      No, I said that making the node cache shared between processes would be bad (prevent improvements that would reduce the impact of its race conditions and reduce database server load).

      I mentioned some types of caching besides nodelets and noted that they wouldn't likely be a big win. I didn't say anything about "anywhere other than".

      I also note that chromatic appears to mostly be talking about load that we'd see on the web servers, which, last I checked, wasn't where the main problem is.

                      - tye
        Sorry, I did bowdlerize your comments a little. No harm intended.
Re: Re: redesign everything engine?
by Anonymous Monk on Jan 28, 2003 at 22:47 UTC
    In my opinion, throwing away five years of development, when you have a working product, is a terrible idea.

    So Microsoft shouldn't throw away their IE code? ;-)

    Also, you wouldn't have to throw away the other code at all, if someone's interested in starting a new project, go for it. If the code ever reaches the point that it's superior then use it. You don't have to decide to switch it before the alternative has arrived. Did you hear people saying "I'm going to switch to Linux" after Linus' first email on the subject?

      I wouldn't go so far as to call IE a working product. Efficiency and correctness are two very separate issues.

      Makeshifts last the longest.

      A reply falls below the community's threshold of quality. You may see it by logging in.