sarvan has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

Is there any way to differentiate pdf from ppt by analyzing the content of those documents. Because,it is possible for a pdf document to have ppt content and also ppt to have pdf content.

So, i need to know any clues from the document can be used to differentiate. Because in my work i will get a url. From that url i need to make sure that the document and its content is pdf or ppt..

Need help in this plz... Thanks.

Replies are listed 'Best First'.
Re: differentiate pdf and ppt
by Anonymous Monk on Aug 31, 2011 at 10:45 UTC
Re: differentiate pdf and ppt
by bart (Canon) on Aug 31, 2011 at 11:37 UTC
    If you read the first few bytes of a PDF file, it always starts with "%PDF" and the version number. I think that is all you need.
      Hi bart, Can you please tell me how can i do that.
        Sheesh, I would think that even any beginner in Perl should be able to tackle this. The tasks you have to do:
        1. read the first few bytes of the file (at least 4 bytes) into a string
        2. see if the string starts with '%PDF'