perleager has asked for the wisdom of the Perl Monks concerning the following question:

Hey,

I'm building a large news database. The articles (the news content) will be retreived by using LWP. All other news categories but "Our Company News" will be inputted through a html form in the admin section.

For the archiving of news articles, should I store the articles content in a MySQL database or just a straight out flat files setup type?. Does it matter? I'm just wondering if storing it into a MySQL database will be faster/efficient considering how large it may be in the next 2-4 years.

Thank you,
Anthony

Replies are listed 'Best First'.
Re: Large News Database
by matija (Priest) on Mar 09, 2004 at 06:39 UTC
    I think you would find it advantageous to at least store the most important header information (Subject, Date, Author, possibly threading info) in a database.

    You're bound to need searching for that specific piece of news some time later. The more news there is likely to be, the less efficient will it be to search through a bunch of unindexed files.

    Also note that if you have the whole article in a MySQL database, you can enable full text searching over the article, which can also come in handy.

    And I don't believe that storing articles in a database is going to eat that much more space than storing it in individual files would have.(Unless you compressed the files, which would make searching through them even more difficult.

    Consider this: Diskspace is cheap and getting cheaper every year. Your time isn't.

Re: Large News Database
by kvale (Monsignor) on Mar 09, 2004 at 06:43 UTC
    As flat-file databases grow, searching slows down linearly with the size of the database. In contrast, time access to articles stored in a keyed database, even a simple one like GDBM, grows only with the log of the number of records -- much faster!

    So for large databases, specialized database programs like MySQL are alomost always the best solution.

    -Mark

Re: Large News Database
by EvdB (Deacon) on Mar 09, 2004 at 07:53 UTC
    If the content is going to be largely static then there is no reason why static files could not be used, except as noted above it makes them difficult to search.

    There is an interesting article, http://www.perl.com/pub/a/2004/02/19/plucene.html, which might give you a few ideas regarding the searching.

    As for access times I imagine that if the user accesses one article at a time then file access would be quicker, as the server could send out preprepared files, and could use a 404 handler to generate the files if they do not exist.

    Chances are that you will need a database at some point for user preferences or similar so in a way you might as well just stick the data in there from the start.

    --tidiness is the memory loss of environmental mnemonics

Re: Large News Database
by astroboy (Chaplain) on Mar 09, 2004 at 08:02 UTC

    One option is to store your content in a database (Relational, flat file or otherwise), and generate the content (articles, contents, etc) to flat files using templates (which are pulled together using includes, SSIs or whatever your templating system supports). This will allow searches to be done against the database while also providing the speed of html access. A couple of low-end (but very good) Perl-based commercial content management systems do it this way - see Article Manager and Big Medium

    .

    If your content starts growing too fast, you may wish to generate the old/rarely accessed articles on demand, while keeping the new content in flat files

Re: Large News Database
by Abigail-II (Bishop) on Mar 09, 2004 at 11:43 UTC
    For the archiving of news articles, should I store the articles content in a MySQL database or just a straight out flat files setup type?. Does it matter?
    Of course it matters. What's more appropriate highly depends on what you are going to do with it. How often do you do updates? How many? Sequential? Random? How often do you query? What kind of queries?

    What I'm also wondering is, storing news archives has been done thousands and thousands of times the more than two decades Usenet is old. There is a myriad of software for it available, and lots of it for free. Why not use what's available?

    Abigail

Re: Large News Database
by pbeckingham (Parson) on Mar 09, 2004 at 14:37 UTC
    You may want to consider the benefits of importing news articles into your database form an RSS (Really Simple Syndication?) feed.

    Many, many sites are now exposing XML RSS files, and the ability to import those might buy you some flexibility in incoporating new sources.