Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hey everyone. If you had a community website with thousands of members that come each day, how would you set up your database before building a script that allows members to edit, create and view member profiles.

I have the field where it allows a description of each member, I want this field to allow as many characters a MySQL "TEXT" columnn type field allows.

Is it bad to have one database store that much info? Would it be better to store the description field inside a file vs a database? Does it matter? Am I over thinking this topic? I'm trying to keep in mind that if this site grows, they're maybe 500,000 ppl coming in and out a week. That in mind, theyll definetly be a lot of queries to this profile database.

Thanks,
TT

Replies are listed 'Best First'.
Re: Member profiles - file or db?
by jbert (Priest) on Nov 07, 2006 at 11:14 UTC
    I'd say to do whatever is easiest for you right now (which is probably the db route) and localise the knowledge in your code.

    i.e. the only code which knows that the profile is in the db for a user is the accessor which fetches it and makes it available to the rest of your site and the creation/update code which stores it.

    Then you can change your mind later by:

    1. add a new column (profile_filename) to your table
    2. change your accessor to use profile_filename (if present) or the old column if not
    3. change your profile creation code to save as a filename
    4. run a batch job to create filenames for all profiles still stored in the db, populating the new column
    5. drop the old column and remove the backward-compat code from your accessors.
    Which looks like a lot of work in a list, but isn't that much, really.

    Also, you might find that you end up wanting your profiles in the db. If you need to scale, sometimes the best way is to replicate your back-end storage. If you're storing in more than one place (db and filesystem) you've complicated the process by which you do that.

Re: Member profiles - file or db?
by Cabrion (Friar) on Nov 07, 2006 at 12:17 UTC
    Yes use a database. I prefer PostgreSQL, but I'll let you slide.

    Make sure you look into indexes at that volume. They will improve performance a lot if you implement them correctly. Do your homework though because they can degrade perfromance if you don't use them correctly.

    If you are really concerned about growth, put the profiles in a separate database from your session table, etc. That way you can move it to another machine someday without rewriting.

Re: Member profiles - file or db?
by skx (Parson) on Nov 07, 2006 at 14:54 UTC

    Does it matter? Well not right now, not until you have the community setup.

    Once things are up and running though performance would almost certainly be better if you were using a database - since you'll be wanting to display things like username, last login tim, profile details, etc.

    It just makes sense to me to use a database - especially because I'm assuming you'll have to store login names and passwords anywhere. A database is designed to get fast at queries and to link things together.

    Maybe something simple will get you started, something like this:

    CREATE TABLE `users` ( `id` int(11) NOT NULL auto_increment, `username` varchar(25) NOT NULL default '', `password` varchar(32) default NULL, `email` varchar(75) default NULL, `joined` datetime NOT NULL default '0000-00-00 00:00:00', PRIMARY KEY (`id`), KEY `username` (`username`) ); CREATE TABLE `user_info` ( `userid` int(11) NOT NULL, `profile_text` TEXT default '', `location` varchar(30), `age` int default 0, ... .. ); CREATE TABLE `user_preferences` .. ; CREATE TABLE `user_friends` ... ;

    Then you can run a simple query to get the profile text, age, etc, from the user_info table, and all the other details from any other tables you might create.

    Steve
    --
Re: Member profiles - file or db?
by swampyankee (Parson) on Nov 07, 2006 at 20:13 UTC

    How much is too much for a database is sensitive to a number of factors, including the database engine, the O/S, and the hardware it's running on, so there's not an answer to "is it bad to have one database store that much info?" without considerably more information. I would not consider a table with 5,000,000 entries to be unacceptably large, but the bulk of my DB experience is as a user, querying a DB2 database via an API (the database stored, among other things, the drawing and manufacturing document data for a very complex manufactured artifact).

    If you're seriously worried, get a DBA to help you layout the DB system.

    emc

    At that time [1909] the chief engineer was almost always the chief test pilot as well. That had the fortunate result of eliminating poor engineering early in aviation.

    —Igor Sikorsky, reported in AOPA Pilot magazine February 2003.
Re: Member profiles - file or db?
by Firefly258 (Beadle) on Nov 08, 2006 at 05:27 UTC
    Databases have several advantages over files when it comes to storing and retrieving data. The principal buzzwords would be effecient, faster, scalable, maintainable, portable and so on.

    Two reasons to consider _not_ using a file as a 'database' at all. There are other more compelling reasons, but these two really do matter.

    1. A file stores its records sequentially, so the bigger the file gets, average transaction times increase linearily. Databases use an array of methods to leverage non-sequential random access, a much quicker and independant average transaction time is guaranteed, also growth of databases do not directly affect lookup times.

    2. No implicit locking is done over individual records so you will have to explicitly lock the entire file before data manipulation or you potentially risk ending up with corrupt/incorrect data as a result of 2 or more simultaneous transactions. Locking an entire file means that during a long transaction, a queue of other transactions can build up and programming around this needs to be done carefully otherwise you end up with pissed-off customers, program deadlocks, crashes, data corruption, grey-hair, etc. It also gets worse as the size of the file grows. Databases on the other hand can lock individual records and most databses allow some simultaneous transactions, they multi-task and handle requests well, so already it's one big monkey off your back.

    Files make bad busy databases, no buts at all. If you expect to have a backend that caters to more than a request per second, and you want quick, consistent lookup times, now or in the future, it would make sense to start off using a professional database now. You also avoid the big headache of having to migrate data over from the file to database later on, which you are bound to do eventually anyway. Databases might be complex and require a lot more work before the wheels start moving but they are the foundation(s) to both your technological (programming) and business models, so it makes sense to get them sorted out first and sorted out right as well.