Re: Large News Database
by matija (Priest) on Mar 09, 2004 at 06:39 UTC
|
I think you would find it advantageous to at least store the most important header information (Subject, Date, Author, possibly threading info) in a database.
You're bound to need searching for that specific piece of news some time later. The more news there is likely to be, the less efficient will it be to search through a bunch of unindexed files.
Also note that if you have the whole article in a MySQL database, you can enable full text searching over the article, which can also come in handy.
And I don't believe that storing articles in a database is going to eat that much more space than storing it in individual files would have.(Unless you compressed the files, which would make searching through them even more difficult.
Consider this: Diskspace is cheap and getting cheaper every year. Your time isn't. | [reply] |
Re: Large News Database
by kvale (Monsignor) on Mar 09, 2004 at 06:43 UTC
|
As flat-file databases grow, searching slows down linearly with the size of the database. In contrast, time access to articles stored in a keyed database, even a simple one like GDBM, grows only with the log of the number of records -- much faster!
So for large databases, specialized database programs like MySQL are alomost always the best solution.
| [reply] |
Re: Large News Database
by EvdB (Deacon) on Mar 09, 2004 at 07:53 UTC
|
If the content is going to be largely static then there is no reason why static files could not be used, except as noted above it makes them difficult to search.
There is an interesting article, http://www.perl.com/pub/a/2004/02/19/plucene.html, which might give you a few ideas regarding the searching.
As for access times I imagine that if the user accesses one article at a time then file access would be quicker, as the server could send out preprepared files, and could use a 404 handler to generate the files if they do not exist.
Chances are that you will need a database at some point for user preferences or similar so in a way you might as well just stick the data in there from the start.
--tidiness is the memory loss of environmental mnemonics
| [reply] [d/l] |
Re: Large News Database
by astroboy (Chaplain) on Mar 09, 2004 at 08:02 UTC
|
One option is to store your content in a database (Relational, flat file or otherwise), and generate the content (articles, contents, etc) to flat files using templates (which are pulled together using includes, SSIs or whatever your templating system supports). This will allow searches to be done against the database while also providing the speed of html access. A couple of low-end (but very good) Perl-based commercial content management systems do it this way - see Article Manager and Big Medium .
If your content starts growing too fast, you may wish to generate the old/rarely accessed articles on demand, while keeping the new content in flat files
| [reply] |
Re: Large News Database
by Abigail-II (Bishop) on Mar 09, 2004 at 11:43 UTC
|
For the archiving of news articles, should I store the articles content in a MySQL database or just a straight out flat files setup type?. Does it matter?
Of course it matters. What's more appropriate highly depends on what you are going to do with it. How often do you do updates? How many? Sequential? Random? How often do you query?
What kind of queries?
What I'm also wondering is, storing news archives has been done thousands and thousands of times the more than two decades Usenet is old. There is a myriad of software for it available, and lots of it for free. Why not use what's available?
Abigail
| [reply] |
Re: Large News Database
by pbeckingham (Parson) on Mar 09, 2004 at 14:37 UTC
|
You may want to consider the benefits of importing news articles into your database form an RSS (Really Simple Syndication?) feed.Many, many sites are now exposing XML RSS files, and the ability to import those might buy you some flexibility in incoporating new sources. | [reply] |