jmcnamara has asked for the wisdom of the Perl Monks concerning the following question:


I am the current maintainer of Spreadsheet::ParseExcel and I am looking for help from some people with an interest in cryptography in Perl.

I would like to extend Spreadsheet::ParseExcel to parse encrypted Excel files with user supplied or default passwords. This is a frequently requested feature from end users.

The problem is as follows. At its most basic level an Excel file is comprised of sequential binary records like this:

Name Length Data 2 bytes 2 bytes variable length

When the file is encrypted some additional unencrypted records are added to the start of the file to define the encryption type (usually RC4) along with some information such as salt and verifier hash. The Data segment of the subsequent records, but not the Name or Length bytes, are then encrypted.

I would like to add a function that decrypts the data block using the a password and returns the unencrypted data so that ParseExcel can continue to parse the data. There is also a class of files that are encrypted but can be decrypted without a password (or with a default password) that I would also like to be able to handle.

The encryption algorithms are reasonably well documented and I can point anyone who is interested to various sources including documentation from Microsoft. I can also provide some debugging tools and I can provide an encrypted version of the test suite.

Note, I am not interested in brute force decryption of Excel files. Only the user supplied and default password cases.

I could probably figure it out myself but I have a large number of other issues to deal with and this task would suit someone with some experience and interest in cryptography.

If you are interested drop me a line at jmcnamara@cpan.org.

--
John.

Replies are listed 'Best First'.
Re: Looking for help with decryption in Spreadsheet::ParseExcel
by Tux (Canon) on Jan 22, 2010 at 13:30 UTC

    Comes down to something I'd like to see support for in perl: unpack on streams. So you could rewrite

    while (read $fh, my $dta, 4) { my ($name, $len) = unpack "vv", $dta; read $fh, $dta, $len; $encrypted and $dta = decrypt $dta; :

    to

    while (my ($name, $dta) = unpack "vv/A*", $fh) { $encrypted and $dta = decrypt $dta; :

    Enjoy, Have FUN! H.Merijn

      That's a great idea, but I'm not sure how this would relate to $/. I think it should always ignore $/, as the filehandle is "obviously" something that doesn't fit into the usual scheme of fixed-width or delimited records. But other than that, it would be nice to teach unpack about streams indeed.