ajlittoz has asked for the wisdom of the Perl Monks concerning the following question:

Hi wise monks,

I'm trying to use File::MMagic to filter non-text files in order to avoid further invalid UTF-8 sequences (I am aware this is not 100% bullet-proof if file uses another encoding).

I create a new instance, passing it the system 'magic' file, with

$magic = File::MMagic->new('/var/share/misc/magic');

This the Fedora Linux location.

When I use it in

$magic->checktype_contents($buffer)

File::MMagic emits a warning Bad offset/type at line 1 followed by a dump of the magic file.

This happens consistently across various Fedora versions, i.e. with Perl ranging from 5.8 to 5.22.

If I pass no file to method new, everything is OK. After investigating, the system magic file starts with an empty line. After removing this initial empty line, warning disappears.

Source for File::MMagic shows in sub readMagicEntry that empty lines should be silently ignored thanks to

$line =~ /^\s*$/

This is effectively the case for non-initial empty lines (there are many in the magic file).

I have not checked what is and what is not initialized in $$MF[1] at beginning of magic file "compilation" as I solved my problem by making a private copy of the system file and removing the initial empty line.

My code runs in a server and exhibit the same behaviour under Apache/mod_perl or lighttpd/straight CGI.

Unless I made a mistake (I'm quite a newbee in Perl), this should be reported as a bug to the File::MMagic maintainer.

Regards,
ajl

Replies are listed 'Best First'.
Re: File::MMagic bug?
by stevieb (Canon) on Jan 16, 2016 at 22:43 UTC

    Although I haven't tested this, I did take a quick look through the code, and at first glance, it appears you're right.

    Can you let us know what the file is that causes the grief (hopefully some of us have it installed by default). Otherwise, can you put it up somewhere in binary format?

    I'm in the pull request mood today. Even though this project doesn't seem to be on github, I'd take a crack at troubleshooting and write a patch if necessary.

      Hi stevieb, thanks for your quick answer

      Which file do you need? The "magic" one?

      I think my instance is not really relevant. The important fact is the first byte in the file is 0x0A (then 0x23 0x2d 0x2d ..., i.e. line 2 is a standard comment starting with a "hash" character).

      Since this is my first contact with the site, how can I attach this ~50k file to my post?

      ajl