Just a rough guess, but I don't really think that NNTP over stateless-HTTP is a bright idea.
First off, downloading the pages. Either you're going to download some articles and then try to download referenced articles, or you're going to download all the articles from the NNTP server and line them up yourself. The first approach means multiple random accesses to the NNTP server (hopefully you don't connect/disconnect a bunch of times - I'm talking about one page refresh here from the web side), the second means downloading everything and handling it in your NNTP client, which means that you're going to copy all the articles for the current group into every CGI process that is in that group. If the NNTP server is a separate machine, this can mean silly amounts of network traffic, but even in the same machine, it can still be a lot of traffic over the local network. I doubt any NNTP servers handle shared-memory connections like some relational databases do, so there's a bit of a hit by going through some of the TCP/IP layers before being rerouted back onto the same box.
A relational database solves these issues. You simply write your SQL to just get the articles you care about (this might be recursive SQL, but, hey, that's what SQL is for, right?). None of the rest will be sent across the (virtual?) wire. And what does get sent may be sent via shared memory, if your client/server supports it and you're on the same box (some db's can actually use shared memory across multiple boxes, but if you've got a set up like that, you're not asking this type of question on perlmonks, you've got a DBA team earning close to six figures each, and probably underpaid at that for their level of knowledge, all ready to smack you for even asking about this).
Second, metadata. NNTP only exposes a limited amount of metadata without pulling down a bunch of extra headers. Things like "how many unread messages?" aren't too bad, even with NNTP, though you need a relational database just to store the last-read marker for each of your users. Figuring out who wrote the last message in the group isn't too bad - ask for the highest message number, and request its header (two requests over the one connection). But if you want to know who wrote the last message in each thread, well, now you have to query all the headers (not just the new ones!) and put each header into a bucket for each thread to see what the last one is. Database? Single SQL statement gets you both the list of threads AND the last post (both user and timestamp). I have such an SQL statement (well, many of them) for the CB stats - see, especially, the lists at the bottom (getting the top referrers, and who referred, both earliest and latest, or the top karma, including the latest reason). Each of these tables is a single SQL statement. If the data were stored like NNTP, I'd have to request all the data, and parse it out myself.
Also along the line of metadata is extensibility: if you want to provide a page with different information, you can either extend the metadata in the database, or you can query differently or whatever, but you can get the data you want just by manipulating SQL. With NNTP, you'll be stuck with the data NNTP gives you, or by parsing out the headers yourself (again, you need to download ALL of them, generally speaking, to get the pertinent statistics or what have you).
The only advantage I see NNTP having is if you want to ALSO run an NNTP interface to your threaded discussions. Even then, it'd be best if the NNTP server used a db backend that you could query directly for your web interface, IMNSHO.
In reply to Re: store threaded discussion
by Tanktalus
in thread store threaded discussion
by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |