BrianP has asked for the wisdom of the Perl Monks concerning the following question:

NetWallah,
I passed in hash keys with symbols (NEF, STS) and data
the zip_internal_fqp. It replaces the file pointer with
full data; 217 MB TIF/Raw. Key stays the same, follows
and dereferences the file_sys, FullyQualifiedPath pointer.

I also loop <3<-param> times to get a clean read with:
Everything_ok
size_label -> File_name/size
matching_file_on_hard_drive
had a few, transient, File-Not-Found errors

TYVM ***

NEED: %name2blob=&extract_zip($zip_FQP, $nef, $sts);
where $nef and $sts files derived from Zip filename.

Detail: I have a large number of .7z files to verify before deleting
the originals. I want to verify that the zip opens and that
the .nef/.raw/.tif has an md5 matching the original stored in
a STatuS file in the zip. CRC check may be inadequate.

Current inefficient method:

7z e ca-2016.0227-256152.nef.7z -y -o $TMP ca-2016.0227-256152.nef 7z e ca-2016.0227-256152.nef.7z -y ca-2016.0227-256152.nef.sts # Show only md5, not filename. PSR is cmd line s/// w/Perl regexp md5sum $TMP/ca-2016.0227-256354.nef | <br> psr.pl "s/^(\\w+).*$/\$1/" > freq.tmp # Extract original md5 as last w/s token on first line head -n1 ca-2016.0227-256354.nef.sts | <br> psr.pl "s/^.*\\s+(\\w+)\\s*\$/\$1/" >> freq.tmp # Show frequency distribution, expect 2 copies, same md5, 1 line freq.pl freq.tmp # Hash distinct tokens 2 ebf5216f8bc9afcd0eb208c4b5a0a18a rm $TMP/ca-2016.0227-*.nef ca-2016.0227-256354.nef.sts freq.tmp etc. etc. etc...

Desired method:

Perl: Zip=ca-2016.0227-256359.nef.7z -> ca-2016.0227-*.nef/.sts ($base=$zip) =~ s/\.7z$//; # Strip .7z suffix -> NEF name %zext=('nef' => "$base", 'sts' => "$base.sts"); # Zip Extract # Overwrite zip internal filenames with binary file contents &extract_zip('/some/dir/ca-2016.0227-256359.nef.7z', \%zext); my $new_md5=md5_hex($zext{nef}); if($zext{sts} =~ m/$new_md5) { &Success(); } else(&dispair())

All of the zip modules I have found read chunks or dump files, ...
IO::Uncompress::Unzip; Archive::Zip; ex::lib::zip<???,
File::Redirect::Zip (close?); 639 cpan "zip" modules Some read into a scalar. I need a HASH to associate a file with a
binary blob. The .NEF files are 76 MB each with ~56% compression
ratio so efficient XS code would be super.
Best Perl (or even C) way??

Replies are listed 'Best First'.
Re: How to Read .7z files into Hash?
by NetWallah (Canon) on Mar 03, 2016 at 22:16 UTC
    Completely Untested, and theoretical:
    use Path::Tiny; my %name2blob; sub extract_zip{ my ($filename) = @_; my $nef = path($filename)->basename('.7z'); my $sts = $nef . ".sts"; my $sevenzcmd="ze e $filename -y -so "; open my $data, "-|", $sevenzcmd . $nef or die "Cannot open 7z $nef:$!"; $name2blob{$filename}{NEF} = path ($data)->slurp; close $data; open $data, "-|", $sevenzcmd . $sts or die "Cannot open 7z $sts:$!"; $name2blob{$filename}{STS} = path ($data)->slurp; close $data; }
    Hmm - you may need to write your own "slurp" if Path::Tiny->slurp() does not like file handles.
    do { local $/; <$data> }

            "Think of how stupid the average person is, and realize half of them are stupider than that." - George Carlin