bobdole has asked for the wisdom of the Perl Monks concerning the following question:

What is the best way to extract a pdf from a blob field? The database is a DBISAM Database system so I am not talking directly with it from perl but I can export the blob file as a text file.

Replies are listed 'Best First'.
Re: extract pdf from blob
by roboticus (Chancellor) on May 18, 2006 at 04:25 UTC
    bobdole:

    A few points:

    1) If the blob contains a pdf file that's not encoded in any fashion, and you're doing a console app and just want a PDF file, then you'd just extract it and save it with a pdf extension.
    2) If the blob contains a pdf file that's not encoded in any fashion, and you're doing a Web app and are returning a PDF file, set the mime type appropriately.
    3) Not to be snide, but this is PerlMonks, and we generally want to talk about Perl-related topics. For database topics there are suitable forums. Please use them.
    4) Your question is too vague to be able to expect any useful answers. You might want to examine How (Not) To Ask A Question.

    --roboticus

      The pdf is encoded because you cant just export it with a pdf extensions. And maybe I should have been more clear in my post how do I convert the blob file with PERL into a readable pdf.
        That depends .... how is it encoded?
        Basically, the procedure would be something like:

        #!/usr/bin/perl -w use strict; use warnings; [code to open database] sub decode { my $encoded_data = shift; [code to decode $encoded_data to $decoded_data] return $decoded_data; } my $blob; [code to get column from database and put into $blob] my $pdf = &decode($blob); open(OUT_PDF, ">DOCUMENT.PDF") or die "Can't open file!"; print OUT_PDF $pdf; close(OUT_PDF);
        The chunks in the square brackets are the bits of Perl code that I can't specify because you didn't give enough clues.

        For the first one, you need to get access to your database. In the Perl community, it seems that the DBI package is the most popular (it's what I use). I don't know anything about DBISAM, so I don't know if there's a DBI driver module available for it. If not, then you could use the ODBC driver (again, that's what I usually use), providing you have an ODBC driver for your database.

        For the second item (the decode function), I can't help you with that as you haven't told me anything about the encoding scheme. If you know the encoding scheme, look on CPAN for a module that will decode the appropriate scheme, and plug in the appropriate code.

        In the final chunk, if you used DBI, you could use something like this:

        my $STH = $DBH->execute("SELECT [column] " . "FROM [table] " . "WHERE [condition]") or die "WTF? I can't find what you're looking for!"; my $array_ref = $STH->fetchrow_arrayref; $blob = $array_ref->[0];
        Of course, there are now a few new blanks to fill in which you'll have to determine, because you didn't say.

        --roboticus

Re: extract pdf from blob
by MonkE (Hermit) on May 18, 2006 at 20:06 UTC
    Since you haven't posted any code, I am forced to make all a few assumptions about what you are trying to do. So if I don't answer your question, I hope you'll understand.

    First of all, a blob is a binary large object. If the creator of the database did their work correctly, the entire contents of the PDF file should be present in the blob (no "conversion" requried). Indeed it would seem that the challenge you face is to NOT CHANGE the binary data when your program is handling it. You must take care to not introduce unwanted newlines and such. You should really look at the perlfunc:syswrite method for writing the resultant PDF file.

    As for accessing the data, I don't see a CPAN module for DBISAM, but you could always use ODBC to retrieve the blob. Good luck.
Re: extract pdf from blob
by JamesNC (Chaplain) on May 21, 2006 at 13:05 UTC
    I addition to the other comments, I would add that a PDF is a really a binary file and commonly contains flate/or other filter type encoded compressed content stream(s). Make sure you use binmode on any filehandle that you use on pdf streams going into or out of the blob field.