This is more of a general design question than a perl question, so let me apologize in advance for my breach of monkly etiquette.

I've been working on a system for the web based delivery and management of academic course materials. The system itself is perl/cgi.pm based and stores the course materials in a MySQL database. The course materials are, for the most part, text with a bit of embedded markup. The system takes the course materials and generates HTML for output. Any non-text element such as an embedded graphic or sound is stored outside of the database in the file system. I am currently in the process of writing a new version of the system to use mod_perl to enhance performance and to clean up a rather messy database design. I've been toying with the idea of storing images and sounds in the database rather than the file system. My problem is that I'm really not sure this is a good idea? Does anybody have any experience with this kind of setup? If so, are there any issues (performance, security, etc.) I should be aware of before I begin?

----
Coyote

  • Comment on Is it a good idea to store images in a RDBMS?

Replies are listed 'Best First'.
Re: Is it a good idea to store images in a RDBMS?
by Masem (Monsignor) on Jul 21, 2001 at 02:12 UTC
    I would think that only in extreme cases is storing the binary data in a DB a good idea.

    Most non-text HTML elements are retrieved by additional requests to the web browser as opposed to being stored in the HTML stream that is requested initially; if they are stored as plain files, this is a trivial operation for the server. On the other hand, if you have to have the request go through a a second CGI that accesses the database and passed back the data, then you're slowing down the process needlessly.

    Not that you can't store file information in the DB that relates to these images and other binary data, but it's better to avoid storing the binaries in the DB proper.

    ----------------------------------------------------- Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain

Re: Is it a good idea to store images in a RDBMS?
by eejack (Hermit) on Jul 21, 2001 at 03:02 UTC
    You might want to just try it out with some test pages.

    Benchmark the differences between serving static images (with the paths determined from information stored in the database) and images straight out of the database.

    Besides the performance differences, you also need to consider how to put the images in safely. If I make the assumption that the course materials will be maintained through webforms, you have to remember you will need to deal with getting the content in as well as getting it out.

    Some other gotcha's might be not being able to use certain sql calls depending upon the database (older mysql wouldn't allow groupby with blob or text fields for example).

    Personally I would give it a test though, it is an interesting exercise, and under the right circumstances a nifty way to handle certain problems. For example, one site I maintain has little images of authors that I keep in the database...they are used on several sites concurrently. Keeping them in the database means I don't have to worry about having common directories, and whenever they add new sites, the author bio section is plug and play easy. I had to compromise and not allow the client to put the images in themselves (to check sizing and for security), but they can handle the rest of the info through a web form.

    EEjack

Re: Is it a good idea to store images in a RDBMS?
by tadman (Prior) on Jul 21, 2001 at 10:51 UTC
    Depending on your choice of RDBMS, you may find it is advantageous. Your application, and the planned development thereof, would also influence your decision.

    It would be fairly trivial to write a Java or VB client for your DB, and it could view the images by retrieving them from the DB quite easily. However, if these were stored in some alternate method (a.k.a. files on disk), then you would need to use HTTP or some other mechanism to transport them, which would be more complicated.

    Retrieval Time
    Remember that after you put 10,000 images in a single directory, your OS may have trouble looking up filenames. Quite often these directories are not indexed, so finding a file takes, to put it in math terms, O(n) time, which is, to put it in simple English, really awful. You may find that a simple lookup in a large directory could take 1-2 seconds. In a properly indexed DB, retrieval time should always be fairly quick.

    Of course, you can always get around this by sorting your images into different directories using a hash-technique, or some creative variation. 100 directories with 100 files each is much, much faster than 10,000 files in a single directory. The downside is more programming.

    Storage Space
    Your DB might actually be a better way to store images than your filesystem, if the block sizes for "BLOB" fields are small enough. It is not uncommon to see people using 64K blocks, which means that a 2K GIF image actually uses 64K of disk space. A lot of tiny images can fill up a disk, even though their aggregate size is much smaller. A DB with a 1K block would actually save disk space.

    Of course, if you were planning ahead, you could format your filesystem with the appropriate block size, if your OS allows for such a thing (i.e. mkfs -b 1024). This, though, is a lot of work for something that should be quite easy.

    Access Control
    Implementing a DB-level access control, especially using an RDBMS's own methods, is fairly easy. Reimplementing this on the filesystem level can be quite tricky, especially if system accounts are involved.

    Your decision should be based on careful analysis of your immediate and planned requirements. The DB solution works, and the filesystem one does too. Personally, if you want a more "elegant" solution, the DB route does keep things much more managable, since in effect you can query your filesystem.
      Your embedded images and sound will be accessed via HTML tags, rigth? Consider, that if you store them in database, you will need to extract them into temporary files each time you neet to generate page containing them. So you need to find the filename of .gif file stored in DB, extract large binary contens and write it to temporary file, and delete afterwards. Looks like potential bottleneck for me.
      But, if you store in your DB just file name, you need just print filename into proper HTML tag, what you are doing anyway.
      But, as tadman noted, too much files in one directory can slow you down, too.
      This leaves you with spreading them into many directories. How to do it most efficiently?
      This was also asked here, one smart proposal what I liked was to place file foo.gif into /f/fo/foo.gif. Straightforward and error-prone, even if file is moved into wrong directory by mistake. Sorry I cannot remember who proposed this naming scheme.
      Remember, if something can go wrong, it will.

      pmas
      To make errors is human. But to make million errors per second, you need a computer.

        Consider, that if you store them in database, you will need to extract them into temporary files each time you neet to generate page containing them.

        Not necessarily. One could write a CGI / handler that fetches a picture directly from the database, which to a browser, would look no different than a request for a file. This sort of thing was recently discussed here.

           MeowChow                                   
                       s aamecha.s a..a\u$&owag.print
Re: Is it a good idea to store images in a RDBMS?
by Anonymous Monk on Jul 21, 2001 at 14:17 UTC
    I would not recommend this proceedure for the following reasons. 1. I was fighting with this same issue using VB6 through an Access 97 database. (which should have done the job well) I crunched the size of the images to thumbnail size prior to inserting into the database yet the database was huge, 5.6 megs with only 3 text fields and the one image field. (Long binary data) I had only 500 records. DATABASE to large. 2. The main issue, images stored in databases are far too large. You never get the value expected, a 4k image is stored as 12-18k etc. With my project I ended up leaving the images out of the database and just put them in directory locations and called them through my code. I am much sronger in Perl than VisualBasic and if the situation required Perl I would NEVER put images in a RDBMS database. In fact I don't care how it is layed out, the language, or otherwise images in databases are just a nightmare. Just my 2 cents