The nice thing about having files in the database is that
you can have constraints set up to keep files from being
orphaned. For example, let's imagine we were implementing
sourceforge, in perl and postgres. Sourceforge needs to deal
with a lot of files, with projects and files being created
and deleted all the time. If everything's in the database,
if you delete a project, you can set things up so that
a cascaded delete will remove all the files from the database
at the same time, and you can tell, with a simple query, if
you don't do that, which files are orphaned. Moreover, your
database backups, if done right (check the pg_dump manpage),
will include the appropriate version of all the files in
there, so you'll never need to worry about them getting
out of sync with your regular backups. Finally, you don't need to worry about a separate access path to manage the files
referenced in your database -- if you can SQL in, you can
upload/update/delete the files (subject to database ACLs, of course), and there's no need to also allow ssh, ftp, or whatever other solutions to manage this stuff. In sum, it's easier
to get all the smarts that your database has to cover files
if the files are in the database. | [reply] |
And the reason not to put files in the database is that all of your file-based utilities (rsync, ,tar, less, grep, mv, vim, etc.) become useless and have to be replaced.
I think of it like this: the files are the master data, and the database info about them is just a handy index for locating them based on particular attributes. You can delete the index and rebuild it any time, since the master info is really the files themselves.
| [reply] |
Well you might be using a OS that doesn't have useful file-based tools.
| [reply] |
Remember that with many of those tools, you'll need to update
the database afterwards anyhow, and that's not necessarily a
trivial thing to do after arbitrary changes to the directory.
Really though, you do have a point that, depending on needs,
it's not entirely unreasonable to prefer the filesystem to
be the primary data store. I personally strongly prefer the
integrity constraints possible when the database manages
everything, but I can see how people might have different
needs. With changes to the files mentioned, I wouldn't use
mv (psql can rename the blobs), and for the rest, I'd pull
them out of the database, change them, and insert (update?) them back in. *shrug*
| [reply] |