in reply to Spreadsheet::ParseExcel with embedded PDF cells
As such Spreadsheet::ParseExcel isn't of any use in this case. If you want to extract the PDF files you will need to use OLE::Storage_Lite.
The first thing you will need to find out is the PPS (property set) name of the embedded objects. The smplls.pl utility that is part of the OLE::Storage_Lite will show you the File structure and the PPS names. For example:
perl smplls.pl Book1.xls 00 1 'Root Entry' (pps 0) ROOT 00.01.1900 +00:00:00 01 1 'Workbook' (pps 1) FILE 1000 +bytes 02 2 ' SummaryInformation' (pps 2) FILE 1000 +bytes 03 3 ' DocumentSummaryInformation' (pps 3) FILE 1000 +bytes
Then you can extract the PPS structures using OLE::Storage_Lite. Here is a sample program that extracts the "Summary Information" from an Excel file to get you started.
Note, if the PPS name appears to start with a space it may actually be a low ordinal character such as "\0", "\1" or as in the case above "\5".#!/usr/bin/perl use strict; use warnings; use OLE::Storage_Lite; my $file = 'Book1.xls'; my $stream_name = "\5SummaryInformation"; # Convert stream name to UTF16. $stream_name = pack 'v*', unpack 'C*', $stream_name; # Create the OLE reader object. my $ole = OLE::Storage_Lite->new($file); # Find the required stream in the OLE container. my $stream = ($ole->getPpsSearch([$stream_name], 1, 1))[0]; die "Couldn't find required OLE data in $file. $!\n" unless $strea +m; # Do something with the data. my $data = $stream->{Data}; # Remember to use binmode() on Windows. print $data;
--
John.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Spreadsheet::ParseExcel with embedded PDF cells
by ForgotPasswordAgain (Vicar) on Jan 21, 2009 at 17:20 UTC | |
by jmcnamara (Monsignor) on Jan 22, 2009 at 01:29 UTC | |
|
Re^2: Spreadsheet::ParseExcel with embedded PDF cells
by ForgotPasswordAgain (Vicar) on Jan 11, 2009 at 17:04 UTC |