wertert has asked for the wisdom of the Perl Monks concerning the following question:

Hi all

I've been asked to write something to pull jpg/jfif files out of a binary file. I've had some limited success with my hex editor and I want to try it in perl. From my reading the picture stream starts withs 0xFFD8 and ends with 0xFFD9. The idea is that if I can pull everything out from between these markers we should have a valid jpeg image. I also suspect there are multiple jpegs within the binary file and they also each contain a thumbnail.

So my first question is can anyone point me in the right direction for extracting a block out of a binary file between 2 markers ? Once I get this working I think it should be quite straight forward.

Thanks in advance

wert

Replies are listed 'Best First'.
Re: extract jpg from binary file
by adrianxw (Acolyte) on Oct 27, 2006 at 13:52 UTC
    This isn't directly an answer to your question, more a comment on your approach. In this binary file, can you be certain that all occurances of these 2 bytes represent start/end image markers, and are not either code or data from the binary stuff surrounding the embedded jpg's?

    This is something you may need to consider with your design.

      can you be certain that all occurances of these 2 bytes represent start/end image markers

      I think that's a good point. I have a jpeg image that certainly begins with the "start marker" that the op quoted ... and it certainly ends with the "end marker" that the op quoted. Unfortunately, I can also find an occurrence of both the "start marker" and the "end marker" within that jpeg file.

      That complicates things somewhat.

      Is it guaranteed that there's only one image in the stream ? Is it also guaranteed that the image begins with the first/last occurrence of the "start marker" and ends with the last/first occurrence of the "end marker" ? If the answer to both questions is yes then the problem is easily solved - as we can easily remove the cruft from both ends of the stream, leaving us with the image.

      Cheers,
      Rob
Re: extract jpg from binary file
by Anonymous Monk on Oct 27, 2006 at 13:54 UTC
    Binary data isn't any different from other data. For Perl, it's just a big string.

    Assuming you have the memory, you can do:

    my $jpeg = $binary_data; substr($jpeg, 0, index($jpeg, "\xFF\xD8") - 1) = ""; substr($jpeg, index($jpeg, "\xFF\xD9") + 1) = "";
Re: extract jpg from binary file
by brian_d_foy (Abbot) on Oct 27, 2006 at 21:04 UTC
    I have a similar script to pull PNG images out of Word documents. There might be edge cases, but its been Good Enough so far:
    my $HEADER = "\211PNG"; my $FOOTER = "IEND\xAEB`\x82"; foreach my $file ( @ARGV ) { print "Extracting $file\n"; (my $image_base = $file) =~ s/(.*)\..*/$1/; my $data = do { local $/; open my( $fh ), $file; <$fh> }; my $count = 0; while( $data =~ m/($HEADER.*?$FOOTER)/sg ) { my $image = $1; $count++; my $image_name = "$image_base.$count.png"; open my $fh, "> $image_name" or warn "$image_name: $!", next; print "Writing $image_name: ", length($image), " bytes\n"; print $fh $image; close $fh; } }
    --
    brian d foy <brian@stonehenge.com>
    Subscribe to The Perl Review