in reply to Re: SHA-256? What do you all think of this?
in thread SHA-256? What do you all think of this?

Thanks everyone for the great ideas. I will try them once I study up on threading a bit to see if I can get 1 process to pass in the read chunk to another process which digests it, while the other process reads another chunk of data.

hippo asked: Benchmark and then optimise for chunk size. Did you just pick 100MB at random?

Yes, at random. But I got an idea last night for code changes.

I am now running the main script as a compiled (.pl to .exe) application, kicked off as a detached background process. What the main script does is read in 1 copy of the Bible at a time (8369 copies of the Bible, 127-GIG total), and creates a digital signature for each copy, stored in a persistent SDBM binary hash file of key/value pairs. But after I ran this I discovered that I had only duplicated the same digital signature 8369 times. What was I thinking? No problem, as this just confirmed that the digital signature never changed on the same data.

I ran the background process, monitoring it from TASK MANAGER. I increased the PRIORITY from NORMAL to HIGH via TASK MANAGER. I could have done that through the Win32::Process module. Actual CPU time was about 1 hour, but the process took just under 2 hours in real time.</P.

What I have done is hard-coded the original digital signature for 1 copy of the Bible within by PERL Database GUI user-interface application program. Whenever the end-user switches copies of the Bible (1-8369), my application will take less than 1 second to recalculate the digital signature for the currently selected copy of the Bible, and compare it to the original. (Note: This DB user-interface application uses random access indexing to each copy of the 8369 copies of the Bible, and their 1189 chapters each, via byte offsets stored in persistent SDBM files of key/value pairs tied to in-memory hash tables at run-time.) If the 127-GIG data file (8369 copies of the Bible) has been tampered with or corrupted unintentionally, and that tampering/corruption effected the currently selected copy of the Bible, then a different digital signature will be generated, which will lock out the user from that copy, and present them the message:

"Please contact your Database Administrator and let them know that you have encountered tampering or corruption within the copy of the Bible which you have just selected. Please use the Edit-->Select sub-menu to choose a different copy of the Bible to navigate."

My idea to store the ORIGINAL digital signatures (within an SDBM file) for chunks (sectors?) of data within a large static flat file (data warehouse) could be applied to a situation unlike my Bible database, where the data is different between sectors and generates a different digital signature. Whenever a user random accesses a single record (or group of contiguous records), the associated sector could be read and a new digital signature for that sector checked against the original signature. If they differ, the user would be locked out of that sector.

I also have hard coded within my Bible navigation software, the ORIGINAL size of the Read Only Bible database for comparison to the current size:

# Deny read/write to other users or processes. This is NOT an advisory + lock. # note: sopen() replaces the use of sysopen() when using Win32::Shared +FileOpen sopen(IN, "$DWD\\$DataFile", O_RDONLY | O_RANDOM , SH_DENYRW) || do { Win32::GUI::MessageBox($W1,$ErrStr,"KJV Bible Navigator - Error",16, +); return 1; }; my $size = sysseek(IN,0,2); #-- byte position at bottom/end of file (n +o systell function) if ($size != 137_434_512_864) { close(IN); Win32::GUI::MessageBox($W1,"Size mismatch on Flat File: ($DWD\\$Data +File)", "KJV Bible Navigator - Error",16,); return 1; }

Calling script:

use Win32::Process; use File::Basename; $AWD=dirname($0); #-- works with ShawTest.exe but does not work on ShawTest.pl #-- perhaps would have to ran as: "perl.exe","perl ShawTest. +pl" ?? #-- ShawTest.pl compiled to .exe with the IndigoSTAR PERL application +compiler $ret=Win32::Process::Create($POBJ,"$AWD\\ShawTest.exe","ShawTest",0,DE +TACHED_PROCESS,"."); $ID=$POBJ->GetProcessID(); print "($ret) pid = $ID, $AWD \n"; sleep 5; exit;
CALLED SCRIPT:
use IO::Handle; use Digest::SHA qw(sha256_hex); use Fcntl; use File::Basename; use SDBM_File; $AWD=dirname($0); #-- application working directory path $DWD="C:\\Users\\Eric\\Documents\\FlatFiles"; $DataFile="KJV_Bible_SDBM_528_31102_8369.dat"; #-- 528 byte records * +31102 verses * 8369 Bibles $SdbmFile="KJV_Bible_8369_DigiSign"; #-- open the 127-GIG Bible database for reading sysopen(IN, "$DWD\\$DataFile", O_RDONLY) || die "Open error on input f +ile\n"; %DigitalSignatures; keys %DigitalSignatures = 8369; tie( %DigitalSignatures, "SDBM_File", "$DWD\\$SdbmFile", O_WRONLY | O_ +CREAT, 0666 ); unless ( tied %DigitalSignatures ) { close(IN); die "Can't tie hash table to SDBM Files:\n$DWD\\$SdbmFile (.pag and +.dir) $!"; } $ret=0; open(OUT,"> $AWD\\ShawTestOutput.txt") || do {$ret=1;}; if (! $ret) { OUT->autoflush(1); #-- flush the output buffer each print } else{ close(IN); untie(%DigitalSignatures); die "crashed on open output of $AWD\\ShawTestOutput.txt\n"; } #-- Read in 16_421_856 bytes (at a time) of 137_434_512_864 total byte +s. #-- This process estimated to take 114 minutes on Windows 7 Home Premi +um laptop. #-- Based Upon a previous test where it took: #-- approx. 5 seconds to: (input, digest, output) each 100 million byt +e string read in. #-- i.e. (5 seconds * 1375)/60 = 114.5 minutes for a 127-GIG flat "tex +t" file, containing #-- 528 byte fixed-length records, with no CR/LF(newline,\n) record se +parators. #-- #-- Now, that same flat file, containing 8369 complete copies of the K +JV Bible, will be read in #-- and a digital signature created for each of the 8369 copies of the + Bible. #-- These digital signatures will then be loaded into an SDBM file/sto +re of key/value #-- pairs which will be tied at run-time to an in-memory hash table fo +r lookup #-- and verification of each copy of the Bible accessed by the end-use +r within the #-- database GUI user-interface application. Each time an end-user se +lects a different #-- copy of the Bible to view (1 of 8369 copies), the DB interface app +lication will #-- create a new digital signature for the single copy of the Bible se +lected, and compare #-- that signature against the ORIGINAL signature held in the SDBM fil +e key/val store, #-- tied to an in-memory hash table. There will be 8369 ORIGINAL digit +al signatures held #-- within the binary SDBM file. It is estimated to take around 0.8 se +conds to create the #-- digital signature anew each time the end-user selects a different +copy of the Bible to #-- view. That seems like a reasonable amount of time to expect the en +d-user to wait #-- patiently for a digital signature comparison test (ORIGINAL vs. CU +RRENT signatures). #-- This 0.8 seconds will take place during the time a TREEVIEW widget + is being loaded with #-- the 66 Books of the KJV Bible, and their 1189 total chapters, for +the single copy of the #-- Bible just selected. #-- Will run this script as a Windows(tm) O/S, Detached background pro +cess. #-- First, we compiled ShawTest.pl to ShawTest.exe #-- #-- Perl2Exe V26.10 2018-01-31 Copyright (c) 1997-2018 IndigoSTAR Soft +ware #-- #-- This is an evaluation version of Perl2Exe, which may be used for 3 +0 days. #-- For more information see the attached pxman.htm file, #-- or visit http://www.indigostar.com #-- Generating ShawTest.exe $digest; # print "Processing a 127-GIG file in (8369) large chunks...\n"; $i=0; while ($nbr_bytes=sysread(IN,$buffer,16_421_856)) { $i++; #print "Outputting ($i) of (8369) records\n"; #-- about every 0.8 +seconds $fnum=sprintf("%4.0f",$i); $fnbr_bytes=sprintf("%8.0f",$nbr_bytes); undef $digest; $digest = sha256_hex($buffer); print OUT "$fnum|$fnbr_bytes=$digest\n"; $DigitalSignatures{$fnum}=$digest; #-- store each digital signatur +e in a binary SDBM file. #-- note: each key will be 4 bytes long, and each value 64 bytes l +ong - total 68 bytes. } close(IN); close(OUT); untie(%DigitalSignatures); exit;

Replies are listed 'Best First'.
Re^3: SHA-256? What do you all think of this?
by erix (Prior) on May 24, 2019 at 18:14 UTC

    Checksumming is a good thing to have. But databases that are already implemented got there first: postgres for instance has the --data-checksums option on initdb. (In PostgreSQL 12, expected fall 2019, the setting can even be changed after the database has been initialized.)