prabudass has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks,

I need to extract all images from Microsoft Word file
through perl. Could you please explain the various ways with examples.

Thanks,
Prabudass
  • Comment on How to Extract all images from Microsoft Word File?

Replies are listed 'Best First'.
Re: How to Extract all images from Microsoft Word File?
by marto (Cardinal) on Dec 04, 2008 at 11:11 UTC

    Have you done any research on this? Either spend time looking at how to drive this via Win32::OLE, or simply use it to save a copy of each document in htm/html format which will generate a folder named WordFileName_files containing the embeded images.

    Martin

      marto ++
      Save as a html document is a simple way.


      I am trying to improve my English skills, if you see a mistake please feel free to reply or /msg me a correction

Re: How to Extract all images from Microsoft Word File?
by teun-arno (Acolyte) on Mar 12, 2014 at 21:24 UTC

    Late with this answer. Maybe someone else can profit..

    you do not need imagemagick.M

    see : http://support.microsoft.com/kb/555171

    use strict; # import OLE use Win32::OLE qw(in with); use Win32::OLE::Const; use Win32::OLE::Const 'Microsoft Word'; $Win32::OLE::Warn = 3; # report OLE runtime errors sub Win32_FullPath; if ( $#ARGV == -1 ) { print "Specify a msword file as first commandline argument. Full p +ath to the file is needed\n"; exit; } # specify variables my $filename = $ARGV[0]; chomp $filename; $filename = Win32_FullPath( $filename ) ; my ( $basename ) = $filename; $basename =~ s/\.doc$/_files/; if ( ! -f $filename ) { print "Could not find file : $filename\n"; exit; } # instantiate Word - use the Word application if it's open, otherwise +open new print "Starting word\n"; my $Word = ""; $Word = Win32::OLE->GetActiveObject('Word.Application') || Win32::OLE->new('Word.Application', sub {$_[0]->Quit;} ) || di +e "MsWord is not installed\n"; # get already active msword #print ref $Word ,"\n"; if ( ref $Word ne 'Win32::OLE' ) { print "Cound not open word\n"; exit; } $Word->{Visible}= 0; # we don't need to see Word in an active window # open the specified Word doc print "Opening $filename\n"; $Word->Documents->Open( $filename ) or die("Unable to open document ", Win32::OLE->LastError()); my $savenameHTML = ""; ( $savenameHTML = $filename) =~ s/\.doc$/\.html/; $Word->ActiveDocument->SaveAs({ FileName => $savenameHTML, FileFormat => wdFormatHTML}); # close document print "Closing document and Word\n"; $Word->ActiveDocument->Close(); print "Pictures are in directory : \"$basename\"\n"; exit; sub Win32_FullPath ($) { # This sub will return the appropiate WINDOWS file name ( e.g. / will +be \ # c:/hoge/hoge.xls -> c:\hoge\hoge.xls my $file = shift; if ($] ge 5.006) { $file = Win32::GetFullPathName($file); } $file =~ s|/|\\|g; print "SUB win32::FullPath : $file\n"; return "$file"; }
    Notice : See also : http://cnedelcu.blogspot.nl/2013/02/top-3-ways-to-extract-images-from-word-docx-doc-document.html
Re: How to Extract all images from Microsoft Word File?
by Anonymous Monk on Dec 04, 2008 at 10:49 UTC
    same answer as before, use imagemagick :)
    A reply falls below the community's threshold of quality. You may see it by logging in.