in reply to Re: Find Created and Last Modified Date of Excel xls file
in thread Find Created and Last Modified Date of Excel xls file

Thanks to everyone that replied to my question. However, I don't think I have explained my problem in enough detail...Apologies.

The created and last modified dates I am after are not the dates that are given by the particular filesystem the file is on but are actually embedded in the file itself. So the suggested stat function will not retrieve the required information (I have tested this and shown that the dates reported by the stat function are not the ones I need).

I can find the required dates by opening the file in Excel and looking at its properties. These dates are NOT the same as the filesystem reports.

In general I will not have Excel installed and am therefore not able to use any vba code, for example, to retrieve the required information.

In fact I will be using a Linux based system to retrieve the info where Perl in installed.

I have already written a Perl module that can retrieve the correct information from the new Excel 2007+ file types as these are simply a set of zipped xml files.

However the older Excel files that I need to analyse are fundamentally binary files as explained in my initial post.

I hope my exact needs are clearer than before.

gauss76

  • Comment on Re^2: Find Created and Last Modified Date of Excel xls file

Replies are listed 'Best First'.
Re^3: Find Created and Last Modified Date of Excel xls file
by pryrt (Abbot) on Jun 30, 2017 at 22:11 UTC

    When I move (sftp) a spreadsheet created in Windows to a linux machine and look at the ctime using (stat)[10], it does not match the windows (stat)[10]. If I sftp the file back from the linux machine to the same location on the windows machine (even if I deleted the file in the mean time), it keeps the same (stat)[10] and the Excel Info CreateDate remains the same. If I sftp the file from the linux machine to a new location on the windows machine, the (stat)[10] changes, but the Excel Info CreateDate remains the same. So it appears that it is embedded in the .xls format somewhere, not just using filesystem information.

    I created two .xls spreadsheets with the same value in A1, a couple minutes apart, then did a hexdump and diff'd those: there were differences, but nothing jumped out and said "Create Date is encoded here", though looking at some of the strings, there are sections with ID="{...}", CMG="...", and DPB="..." that I think are likely candidates for encoded binary information. Probably someone more familiar with the .xls binary format (such as the author of Spreadsheet::ParseExcel) might be able to find it more easily. I tried things the hard way (see spoiler below): I did a key-by-key comparison of the ->parse($fn) objects, but nothing jumped out as being truly different (other than filename and format font numbering, which seem irrelevant).

    I was even able to change the Create Date in excel, using the VBA code ActiveWorkbook.BuiltinDocumentProperties("Creation Date") = Date, and re-saved. When I did that, the Excel Info CreateDate changed, and ID="{...}" section moved and changed, which is one of the reasons I think it might hold the Builtin Document Properties. (And "DPB" could be abbreviation for "Document Properties - BuiltIn". But that's just guessing.)

    Inspired by that VBA snippet, if I create a VBA function,

    function CreateDate() CreateDate = ActiveWorkbook.BuiltinDocumentProperties("Creation Da +te") end_function

    and set A1's formula to =text(CreateDate(), "yyyy-mmm-dd hh:mm:ss"), then save the .xls spreadsheet, I can use ParseExcel to grab the value of A1 and grab that CreateDate. So if you have control over the original spreadsheet creation, you can add the information you want into an accessible location; but I get the impression that you want to be able to do it for an arbitrary .xsl, not one that you control the creation of. :-)

    With no further ado, the ParseExcel compar program:

    ###################################################################### +## # per [id://1193822], Look into the details of .xls-format CreateDate: # it's internal to the file, but ParseExcel doesn't give that # BuiltIn Document Property (see https://msdn.microsoft.com/VBA/Exce +l-VBA/articles/workbook-builtindocumentproperties-property-excel) # Might need to dig into the .xsl binary stream: https://msdn.micros +oft.com/en-us/library/office/gg615597(v=office.14).aspx # # For now, trying to compute a difference between two hashrefs -- trie +d Test::Deep and Test::Deep::NoTest, but # the latter didn't work for cmp_deeply, the former only shows the f +irst fail with cmp_deeply, and cmp_details just gives a stash # with a copy of each object, not showing the actual differences, so + what's the point? # # hashdiff and arraydiff worked almost right out of the bat, but I got + a deep-recursion error: # ahh, hash->...->{_Book} points back to the toplevel object; nested + copies even shared the same HASH address, # but I simplified by logic="if the entry is an object and the key s +tarts with _, don't recurse" ###################################################################### +## use warnings; use strict; use Spreadsheet::ParseExcel; use Data::Dumper; go(); sub go { my $parser = Spreadsheet::ParseExcel->new(); my @wbs; foreach my $fn (qw'one.xls two.xls') { my $wb = $parser->parse($fn); open my $fh, '>', "$fn.txt" or die "$!"; print {$fh} Dumper $wb; close $fh; push @wbs, $wb; } open my $fh, '>', 'hashdiff.txt' or die "hashdiff.txt: $!"; select $fh; hashdiff('', @wbs); } # since Test::More::is_deeply() only shows the first difference, # Test::Deep::NoTest::cmp_deeply complained about 'ok', # and Test::Deep::cmp_deeply and ::cmp_details didn't do what # I wanted (or I couldn't understand them), I wrote my own # quick-and-dirty recursive hashdiff() and arraydiff() routines. # I don't claim they're any good for anything, or best-practice code +, # but they seemed to do the trick for comparing two ParseExcel workb +ook objects sub hashdiff { my($p,$h1,$h2) = @_; bless my $b1 = $h1, 'HashDiffObject'; bless my $b2 = $h2, 'HashDiffObject'; die "obj1->$p=$h1 needs to be a HASHREF" unless ref($h1) && $b1->i +sa('HASH'); die "obj2->$p=$h2 needs to be a HASHREF" unless ref($h2) && $b2->i +sa('HASH'); local $" = ", "; my @keys = eval { my %khash; @khash{ keys(%$h1), keys(%$h2) } = () +; sort keys %khash; }; # merge keys print "merged keys = ( @keys ) @{[1+$#keys]}\n"; unless( @keys ) { print "\t=> obj1->$p and obj2->$p are empty\n"; return } print "...\n"; foreach my $k ( @keys ) { print $/; print "obj1->${p}->{$k} missing", next unless exists $h1->{$ +k}; print "obj2->${p}->{$k} missing", next unless exists $h2->{$ +k}; printf "obj1->${p}->{$k} = %s\nobj2->${p}->{$k} = %s\n", $h1-> +{$k}//'<undef>', $h2->{$k}//'<undef>'; if( !ref($h1->{$k}) || !ref($h2->{$k}) ) { # at least one is + scalar print("\t=> scalar equal\n"), next if ($h1->{$k}//'<undef +>') eq ($h2->{$k}//'<undef>'); print("\t!! scalar not equal\n"), next; } if( ref($h1->{$k}) ne ref($h2->{$k}) ) { print("\t!! not same type\n"), next ; } print("\t=> don't recurse $k\n"), next if '_' eq substr $k, 0 +, 1; bless my $o = $h1->{$k}, 'HashDiffObject'; hashdiff ( ($p . "{$k}"), ($h1->{$k}), ($h2->{$k})), next if $ +o->isa('HASH'); arraydiff( ($p . "{$k}"), ($h1->{$k}), ($h2->{$k})), next if $ +o->isa('ARRAY'); die "hashdiff($p)#$k: !!what am I?"; } } sub arraydiff { my($p, $a1, $a2) = @_; bless my $b1 = $a1, 'ArrayDiffObject'; bless my $b2 = $a2, 'ArrayDiffObject'; die "obj1->$p=$a1 needs to be a ARRAYREF" unless ref($a1) && $b1-> +isa('ARRAY'); die "obj2->$p=$a2 needs to be a ARRAYREF" unless ref($a2) && $b2-> +isa('ARRAY'); print $/; printf "obj1->%s = %d # %s\n", ${p}, scalar @$a1, $a1; printf "obj2->%s = %d # %s\n", ${p}, scalar @$a2, $a2; print "\t=> mismatched sizes" unless @$a1 == @$a2; my $min = $#$a1; $min = $#$a2 if $#$a2 < $min; next if $min<0; foreach my $i ( 0 .. $min ) { print $/; printf "obj1->${p}->[$i] = %s\nobj2->${p}->[$i] = %s\n", $a1-> +[$i]//'<undef>', $a2->[$i]//'<undef>'; if( !ref($a1->[$i]) || !ref($a2->[$i]) ) { # at least one is + scalar print("\t=> scalar equal\n"), next if ($a1->[$i]//'<undef +>') eq ($a2->[$i]//'<undef>'); print("\t=> scalar not equal\n"), next; } if( ref($a1->[$i]) ne ref($a2->[$i]) ) { print("\t=> not same type\n"), next; } bless my $o = $a1->[$i], 'HashDiffObject'; hashdiff ( ($p . "[$i]"), ($a1->[$i]), ($a2->[$i])), next if $ +o->isa('HASH'); arraydiff( ($p . "[$i]"), ($a1->[$i]), ($a2->[$i])), next if $ +o->isa('ARRAY'); die "arraydiff($p)#$i: what am I?"; #next if } }