in reply to Anyone using SPOPS to store BLOBs?

Reconsider storing PDF files as BLOBs. Just store them as regular files and store the path to the file in the DB.

With lots of large BLOBs, the database gets bloated, the DB dumps become huge, and it can create problems on platforms with a 2GB file limit, like some versions of Linux.

When you stream a BLOB to a client, you must hold open a DB handle the entire time, and if you try to read the whole BLOB into a Perl variable, you will get a large perl process. Your operating system is designed to store files and you will have a much easier time doing so.

At first glance, SPOPS doesn't seem suited to this task at all. It is a framework for saving Perl objects (complex data structures) to a database. Files are not good things to store in databases.

Replies are listed 'Best First'.
Re: Re: Anyone using SPOPS to store BLOBs?
by ferrency (Deacon) on Jun 24, 2003 at 18:04 UTC
    The problem with storing filenames in a database, and storing the files on disk, is that then you need to roll your own remote file access in the case where multiple client machines need access to the same file backstore. While not impossible, it's certainly higher in development cost than using a canned solution if your preferred database provides it. Especially if you need to support remote file locking, atomic cross-system unique filename generation, and so on.

    The moral being: file access is easy when you have one program running on one machine. It becomes more difficult when you have multiple programs sharing the same data on one mahcine, and even more difficult when multiple programs on multiple machines need to coordinate access to shared resources on one or more remote servers. Databases are good at this, but as others have said it can wreck the performance of other relations.

    Alan

      These are indeed good reasons for BLOBS over files: concurrency and remoting.

      Another is transactions over multiple operations, on multiple files/tables, on multiple servers. This is very difficult to do with filesystems. You need to do 2-phase commit, backup directories, etc. With BLOBS, operations are part of the normal transaction.

      Another is consistency: the BLOB fields are handled the same way as the rest of the data. No need to think tables & files. Just tables.

      Besides, BLOBS are not as difficult as they used to be. If you use something like Tangram, and your BLOBS fit in memory, you don't even have to think about it. You use them like regular fields.

Re: Re: Anyone using SPOPS to store BLOBs?
by talexb (Chancellor) on Jun 24, 2003 at 16:28 UTC

    The reason I am looking at using a text field is that Postgres doesn't appear to support BLOBs. However, it looks like the bytea type may be suitable instead.

    --t. alex
    Life is short: get busy!

      what i believe bunnyman is suggesting is that you do not use a database to store the files. instead, store the filenames, and write the files to disk (just don't erase them.)

      ~Particle *accelerates*

Re: Re: Anyone using SPOPS to store BLOBs?
by talexb (Chancellor) on Jun 24, 2003 at 19:27 UTC

    It's a long story, but the short answer is that I really *do* need to store files into a database. It makes them more portable and accessible.

    It's unlikely that we'll approach a file that's 2G -- I have tried various files up to a 12M example that we have, and they all worked the way I expected, except for the side effect of adding single quotes around the entire files' contents when I extract the file back out of the database.

    The only time we'll need to access the file is when we copy it back out to the file system for processing -- taking just the file name isn't going to work for us because that file may no longer be available when we want to do our processing.

    --t. alex
    Life is short: get busy!
      It's unlikely that we'll approach a file that's 2G

      What I meant was the entire database could become 2GB in total size after adding enough files to it. If the DB is stored in one file, that would be a problem unless your system supports big files or the database is split over a few files.

      I understand the initial thinking of storing BLOB’s in the database but I don’t think a 2G database is more portable than a 1M database with a tree structure of subdirectories that hold the PDF’s. The tree structure can be Tared/Gzipped at any time to copy in bulk.

      I have worked with a VLDB that had BLOB’s in it and the performance was poor. The design team decided to go with file names and pseudo paths to find the images. This repository is one of the largest in the world of GIF and JPG images, a competitor to Getty and now works well.

      Maybe I am missing something here, but why is remote access to this database harder than normal if the PDF’s are stored separately? The file locking for user access should be done at the database level, sort of like a check in and check out system.

      I guess I am really curious as to why the PDF’s have to be embedded, bore me with the long story :}

      Richard

      There are three types of people in this world, those that can count and those that cannot. Anon

          I guess I am really curious as to why the PDF’s have to be embedded, bore me with the long story :}

        Unfortunately I can't get really get into the technical details, for reasons that have to do with confidentiality.

        I am not committed to putting PDFs into a database -- but it's a design choice we'd like to try out to see what we can see about performance and how it simplifies the code. If we end up with 10,000 documents each of which is 10K (say), then we're over the 2G limit, and we have a problem. The existing solution uses a directory structure and works well -- but we're exploring alternatives.

        --t. alex
        Life is short: get busy!
Re: Re: Anyone using SPOPS to store BLOBs?
by talexb (Chancellor) on Jun 25, 2003 at 20:12 UTC

    Apparently Postgres has a 16TB limit for tables, and a BLOB limit of 2G per chunk. If Postgres is used to store lots of BLOBs, I am guessing that it actually stores them in (drumroll please) the file system!!

    --t. alex
    Life is short: get busy!