robins has asked for the wisdom of the Perl Monks concerning the following question:

Okay, I've been working with a web-application that allows users to upload files. I'm using CGI.pm to do the actual uploading. The uploading part is going along nicely, but I need to extract some metadata from the uploaded files to be able to categorize them.

I use stat() and File::MimeInfo::Magic::mimetype to figure out the datatype of the file. But this module has problem figuring out that OpenDocument is in fact ODT and not application/zip. I guess I need to use the passed filename's extension to do a lookup in some table (if you know which one, please tell me!) and assume a specific mimetype if there is in fact an extension passed into the function (the filename is from CGI.pm uploadInfo() ).

For detecting image metadata I use Image::Size, but this ONLY detects dimensions, not other metadata like author and title.

In the sound department I use Audio::Wav and MP3::Info to detect information like duration and title. This works more or less as it should.

It's in the video department I have most problems. MS WMA is detected as video/x-ms-asf (using Audio::WMA), and is thus not filed as audio, but as video (even though it doesn't contain any video frames). The other problem is that I'm unable to extract video dimensions (width+height) out of WMV/ASF files. And because the openquicktime Perl module isn't adhering to my debug settings, I had to fork and use qtinfo (external command) to grab QuickTime metadata (which isn't very cross-platform). And for basic MPEG1/2/4 streams I don't know what to do. None of the packages I found on CPAN seems to be stable/mature enough to feed me sufficient information.

As you can see, this method brings with it a lot of dependencies, and makes it harder to install the application. I was hoping there was an easier way to detect all of the metadata. Maybe there is a metadata extraction project that has integration with Perl? I was thinking about FFmpeg's libavformat, but it's hard to find any documentation on how to use this library in Perl wrt. metadata extraction.

Solutions?

PS: Forking wouldn't be all that bad if there was one single application that could actually manage to detect everything I needed, but doing it for all the different checks would be overkill. Suggestions welcome in that department aswell.

  • Comment on Multimedia metadata extraction - how to have it all?

Replies are listed 'Best First'.
Re: Multimedia metadata extraction - how to have it all?
by jhourcle (Prior) on Jun 14, 2006 at 11:11 UTC
Re: Multimedia metadata extraction - how to have it all?
by wazoox (Prior) on Jun 14, 2006 at 14:12 UTC
    In case it can help you, here's a small module I wrote that uses mpginfo/mpgtx to extract info from a video mpeg file . I also have a previous version using ffmpeg, however the duration isn't as accurate (but it work with many more file formats) I don't have the ffmpeg version at hand, but I'll post it later if it can be of any help.
Re: Multimedia metadata extraction - how to have it all?
by Moron (Curate) on Jun 14, 2006 at 12:15 UTC
    Your references to libavcodec and ffmpeg suggest you are using linux. mplayer can usually detect all the technical info - a clunky solution, however, although this might help prefiltering what module to lookup the rest with.

    -M

    Free your mind

Re: Multimedia metadata extraction - how to have it all?
by wazoox (Prior) on Jun 14, 2006 at 19:51 UTC
    Here's the older version of my module that extracts metadata from video files using ffmpeg. It identifies correctly many video formats.
    #!/usr/bin/perl -w # ###################################################################### +##### # # videoinfo.pm # # V 0.2.0 du 13/12/2004 # support de ffmpeg CVS # # V 0.1.0 du 20/11/2004 # ###################################################################### +##### # # Copyright (c) Intellique 2004 # All rights reserved. # # # This program is free software; you can redistribute it and/or mod +ify # it under the terms of the GNU General Public License as published + by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. use strict; require Exporter; use IPC::Open3; our @ISA=qw(Exporter); our @EXPORT=qw(videoinfo); ########################################################## # videoinfo # renvoie les différentes infos sur un fichier video # dans un hash ########################################################## sub videoinfo { # commande ffmpeg my $ffmpeg='/usr/bin/ffmpeg'; # variables my %finfo = ('duration' => "00:00:00.0", 'bitrate' => "0", 'vcodec' => "", 'vformat' => "", 'vsize' => "", 'framerate' => "0.00", 'acodec' => "", 'samplerate' => "0", 'stereo' => "0", # 0 false (mono), 1 true (stereo) 'audiorate' => "0" ); # fichier à traiter my $file=shift; # échappement des caractères spéciaux $file=~s/(\W)/\\$1/g; open3("</dev/null",">/dev/null",\*ERPH, "$ffmpeg -i $file") or die "ca +n't run $ffmpeg\n"; my @res=<ERPH>; # recherche des éléments foreach (@res) { # durée et bitrate if ( m!Duration: (\d\d:\d\d:\d\d\.\d), start: (\d|\.)+, bitrate: ( +\d+) kb/s! ) { $finfo{'duration'}=$1; $finfo{'bitrate'}=$3; next; } # vcodec, vformat et framerate if ( /(\d\d\.\d\d) fps\(r\): Video: (\w*), (\w+), (\d*x\d*)/) { $finfo{'framerate'}=$1; $finfo{'vcodec'}=$2; $finfo{'vformat'}=$3; $finfo{'vsize'}=$4; next; } # acodec, samplerate, stereo et audiorate if ( m!Audio: (\w*), (\d*) Hz, (mono|stereo), (\d*) kb/s!) { $finfo{'acodec'}=$1; $finfo{'samplerate'}=$2; $finfo{'stereo'}=(($3 eq 'stereo')|| 0 ); $finfo{'audiorate'}= +$4; } } return %finfo; } ########################################## # fin ########################################## 1;
      Sorry I didn't respond in a reasonable time, but this code snippet with ffmpeg was what actually saved me when it came to video. Thanks a lot for the suggestion!