in reply to compressing text

I would suggest not storing this file in the database. I prefer storing a file path in the database.

What benefits does it have to being stored there? Full-text searching, if your database supports that -- are you using it? Existing permissions model for storing data -- this is a good reason. Existing locking model for reads and writes -- this is a good reason.

What drawbacks does it have? It may be hard to model accurately -- what column type do you need? It's really hard to edit in the database. It's a lot of data to send back and forth.

Replies are listed 'Best First'.
Keep your files in the database!
by Improv (Pilgrim) on Apr 11, 2003 at 18:55 UTC
    The nice thing about having files in the database is that you can have constraints set up to keep files from being orphaned. For example, let's imagine we were implementing sourceforge, in perl and postgres. Sourceforge needs to deal with a lot of files, with projects and files being created and deleted all the time. If everything's in the database, if you delete a project, you can set things up so that a cascaded delete will remove all the files from the database at the same time, and you can tell, with a simple query, if you don't do that, which files are orphaned. Moreover, your database backups, if done right (check the pg_dump manpage), will include the appropriate version of all the files in there, so you'll never need to worry about them getting out of sync with your regular backups. Finally, you don't need to worry about a separate access path to manage the files referenced in your database -- if you can SQL in, you can upload/update/delete the files (subject to database ACLs, of course), and there's no need to also allow ssh, ftp, or whatever other solutions to manage this stuff. In sum, it's easier to get all the smarts that your database has to cover files if the files are in the database.
      And the reason not to put files in the database is that all of your file-based utilities (rsync, ,tar, less, grep, mv, vim, etc.) become useless and have to be replaced.

      I think of it like this: the files are the master data, and the database info about them is just a handy index for locating them based on particular attributes. You can delete the index and rebuild it any time, since the master info is really the files themselves.

        Well you might be using a OS that doesn't have useful file-based tools.
        Remember that with many of those tools, you'll need to update the database afterwards anyhow, and that's not necessarily a trivial thing to do after arbitrary changes to the directory. Really though, you do have a point that, depending on needs, it's not entirely unreasonable to prefer the filesystem to be the primary data store. I personally strongly prefer the integrity constraints possible when the database manages everything, but I can see how people might have different needs. With changes to the files mentioned, I wouldn't use mv (psql can rename the blobs), and for the rest, I'd pull them out of the database, change them, and insert (update?) them back in. *shrug*