Stephen Toney has asked for the wisdom of the Perl Monks concerning the following question:

Wise monks: I have a function that examines JPEG files to find the image height and width. Four getc's are used to read four consecutive bytes. The function has been running for years with no problem, but on one particular file, getc" skips ahead after reading each character.

By using "tell" I can see where the file pointer is. Normally tell returns something like this: 270, 271, 272, 273. However, in the problem file, it returns 270, 782, 4367, 4368.

The problem occurs on two different servers. I thought the file was corrupted, but it looks normal using a hex editor, and it opens correction in MS Photo Ed, which also can determine the image size (in other words, Photo Ed can read the bytes I can't seem to read).

Why would getc skip like this? I've Googled and searched this site to no avail.

Many thanks in advance for any wisdom!

Stephen

Replies are listed 'Best First'.
Re: getc skips ahead
by borisz (Canon) on Aug 15, 2004 at 12:37 UTC
    You need to use binmode $fh; on the filehandle, that examines the jpeg direct after the open call. It looks that your input data is in utf8 mode.
    Boris
      Boris, That did it! It's hard to believe that the problem hasn't occurred before on tens of thousands of images, but ...

      Also, the binmode documentation does not mention getc as one of the functions it affects, so who knew?

      Anyway, many thanks for saving my bacon! Stephen

        Also, the binmode documentation does not mention getc as one of the functions it affects, so who knew?
        I don't really think it needs to, it already says
        In other words: regardless of platform, use binmode() on binary data, like for example images.

        MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
        I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
        ** The third rule of perl club is a statement of fact: pod is sexy.

        Note that the piece of information that gives it away is this: "in the problem file, [getc] returns 270, 782, 4367, 4368". You'd normally expect getc() to be reading one byte at a time, and therefore always returning numbers in the range 0..255; only if the file is being read in a different mode can larger integers be returned.

        Update: misread the details.

        Hugo

Re: getc skips ahead
by gaal (Parson) on Aug 15, 2004 at 12:48 UTC
    What does getc return? If undef, then you need to check $! for IO errors.
      You are absolutely right. I should be doing this and will do so. Thanks for the reply. Stephen
Re: getc skips ahead
by derby (Abbot) on Aug 15, 2004 at 14:05 UTC
    That is weird. I would be very cautious about blaming a particular file. You need to post some code. Are your getc's following one after the other? Is there any type of processing going on in between the getc's that would move the offset? Can you replace the four getc with one read to see what happens?

    -derby
Re: getc skips ahead
by Zaxo (Archbishop) on Aug 15, 2004 at 22:26 UTC

    You should consider using Image::Size. It has lots of the rough edges knocked off by testing and use. It will keep working when you find you have some other image format to deal with.

    After Compline,
    Zaxo