JunkGuy has asked for the wisdom of the Perl Monks concerning the following question:

I have a non standard image format from which I need to extract the metadata. I have a description of the file header where the md is located. I was wondering what would be the best and easiest way to pull the metadata out.

(I seem to recall a method of layout out a file/data map - sort of how COBOL works - but can't remember the details or proper name of such...or if it would even work for data extraction.)

Alternately, if there is a way to pull bytes of data out, that may work, as well (IE I want bytes 0 - 11, then I want bytes 12 - 14, etc).

I've been away from perl for a while, and I don't have a lot of experience working in the data extraction realm, so please keep that in mind.

Thank you for your time.

Replies are listed 'Best First'.
Re: Extracting Image/File Metadata
by BrowserUk (Patriarch) on Feb 05, 2003 at 17:42 UTC

    Take a look at perlfunc:unpack. Eg.

    my $record = "abcdefghijklmnopqrstuvwxyz"; my @bits = unpack 'A12 A3', $record; print $bits[0]; # Gives 'abcdefghijkl' print $bits[1]; # Gives 'mno'

    Examine what is said, not who speaks.

    The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.

Re: Extracting Image/File Metadata
by pfaut (Priest) on Feb 05, 2003 at 17:36 UTC

    You can use sysread to read the header out of the file. Then, use something like Data::FixedFormat or Parse::FixedLength to decode the information. These modules use pack and unpack to manipulate buffers of (possibly binary) data.

    --- print map { my ($m)=1<<hex($_)&11?' ':''; $m.=substr('AHJPacehklnorstu',hex($_),1) } split //,'2fde0abe76c36c914586c';
      I think sysread will work for me, except for one thing: I need to be able to pull out and retain some data in binary format. I know that pack adn unpack is used for that, as you stated, but I can't seem to get it to work. Is there any special trick to using sysread with pack and unpack? Thanks for your time and your reply.

        I use the code below to get some informatiion out of WAV files. Maybe it will help you read your files.

        #!/usr/bin/perl -w use strict; use Fcntl; my $fnm = shift; sysopen WAV,$fnm,O_RDONLY; my $riff; sysread WAV,$riff,12; my $fmt; sysread WAV,$fmt,24; my $data; sysread WAV,$data,8; close WAV; # RIFF header: 'RIFF', long length, type='WAVE' my ($r1,$r2,$r3) = unpack "A4VA4", $riff; # WAV header, 'fmt ', long length, short unused, short channels, # long samples/second, long bytes per second, short bytes per sample, # short bits per sample my ($f1,$f2,$f3,$f4,$f5,$f6,$f7,$f8) = unpack "A4VvvVVvv",$fmt; # DATA header, 'DATA', long length my ($d1,$d2) = unpack "A4V", $data; print << "EOF"; RIFF header: $r1, length $r2, type $r3 Format: $f1, length $f2, always $f3, channels $f4, sample rate $f5, bytes per second $f6, bytes per sample $f7, bits per sample $f8 Data: $d1, length $d2 EOF
        --- print map { my ($m)=1<<hex($_)&11?' ':''; $m.=substr('AHJPacehklnorstu',hex($_),1) } split //,'2fde0abe76c36c914586c';