Re: stitch together text file recovered by photorec/testdisk?

Most likely what you have are parts of "older" versions of that file. This isn't bad, as there is likely still a difference to you between an old version and nothing at all.

My approach to attempt to reconstruct the file would be to do it in several steps:

Read the "restored" file containing the nulls
Read the partial files
Eliminate all partial files that occur completely in the good parts of the restored file
Try to find a partial overlap of the end of a readable part of the restored file with at least one partial file
Repeat with the concatenation of the good file and the partial file until you've exhausted all partial files
If you find multiple partial files that match, flag those for manual user review. Maybe the longest overlap is better, or maybe the shortest overlap is better.

That should give you one potential version of your file, with fewer missing parts than before.

You could also try the same with your partial files, and/or try to find the overlaps between different partial files to piece those together.

Comment on Re: stitch together text file recovered by photorec/testdisk?

Replies are listed 'Best First'.
Re^2: stitch together text file recovered by photorec/testdisk? by Anonymous Monk on Sep 10, 2018 at 19:44 UTC
This narrows the list to 68.6 MB by elminating subsequent duplicates #!/usr/bin/perl -- use strict; use warnings; use Path::Tiny qw/ path /; use File::Find::Rule qw/ find /; use autodie; use Digest::MD5 qw( md5_hex ); my $qphotoreclog = 'qphotorec.log'; $qphotoreclog = path( $qphotoreclog )->realpath; chdir path( $qphotoreclog )->parent; my $log = path( $qphotoreclog )->slurp_raw; my @files; my %seen; while( $log =~ m{^(.?)[\r\n]$}mg ){ my $line = $1; next if not $line =~ /recup_dir/; my( $filename, $blocks ) = split ' ', $line, 2; my $md5 = md5_hex( path( $filename )->slurp_raw ) ; push @{$seen{$md5}}, $filename; push @files, [ $filename, $blocks , $md5 , int@{$seen{$md5}} ]; } undef $log; # dd(\@files ); use constant FILENAME => 0; use constant SEEN => 3; print "Files before ", int @files, "\n"; @files = map { $_->[FILENAME] } grep { $_->[SEEN()] == 1 } @files; print "Files after ", int @files, "\n"; # dd(\@files ); path('myfinalrecup')->mkpath; for my $filename ( @files ){ path( $filename )->copy( 'myfinalrecup/' ); } __END__ Files before 40330 Files after 6341 [download] Cant really see a relationship between the blocks and the filename , probably there isnt one `[ "recup_dir.3/f8580464.txt", "70013087-70013102", "2615e08f437222995c7aab0569f015f3", 1, [ "C:/undelet/testdisk-7.0.win/recup_dir.3/f8580480.txt", "70013103-70013110", "6fb0dd36db299c9b713d5c622bf5b499", 1, ], ... [ "recup_dir.3/f8583480.txt", "70016103-70016118", "2615e08f437222995c7aab0569f015f3", 2, ],` [download]	[reply] [d/l] [select]
Re^2: stitch together text file recovered by photorec/testdisk? by Anonymous Monk on Sep 10, 2018 at 21:30 UTC
Most likely what you have are parts of "older" versions of that file. This isn't bad, as there is likely still a difference to you between an old version and nothing at all. Luckily I included timestamps in files and they're mostly sequential ... grepping for the last timestamp it appears at most I've lost 4 hours So this is eliminating more stuff I know for sure i already have #!/usr/bin/perl -- use strict; use warnings; use Path::Tiny qw/ path /; use File::Find::Rule qw/ find /; use autodie; use Digest::MD5 qw( md5_hex ); my $qphotoreclog = 'qphotorec.log'; $qphotoreclog = path( $qphotoreclog )->realpath; chdir path( $qphotoreclog )->parent; my $notit = ''; for my $unwanted ( find( file => maxdepth => 1 , in => 'D:/' ) ){ next if not -T $unwanted; $notit.=path( $unwanted )->slurp_raw; } my @files = sort glob 'myfinalfiles/*'; my @maybeit; my @notit; for my $file ( @files ){ my $isit = path( $file )->slurp_raw; if( $notit =~ /\Q$isit\E/ ){ push @notit, $file; } else { push @maybeit, $file; } } dd( 'notit', @notit ); dd( 'maybeit', @maybeit ); dd( 'files', int @files ); dd( 'notit', int @notit ); dd( 'maybeit', int @maybeit ); path('myfinalmaybeit')->mkpath; for my $file ( @maybeit ){ path( $file )->copy( 'myfinalmaybeit/'); } __END__ ... ("files", 6341) ("notit", 4294) ("maybeit", 2047) 18M myfinalmaybeit [download] randomly viewing a few file, i found a file that seems to be a mix of wanted and unwanted, ugh	[reply] [d/l]
Re^3: stitch together text file recovered by photorec/testdisk? by Anonymous Monk on Sep 10, 2018 at 21:45 UTC
randomly viewing a few file, i found a file that seems to be a mix of wanted and unwanted, ugh Grepping from the mixed file I found two more files that match that file , and with no mixing, `myfinalmaybeit\f67507544.txt is inside the longer file myfinalmaybeit\f9151816.txt and neither file has the unwanted mix from myfinalmaybeit\f93596792.txt` [download] Not seeing any matches already rejected (from myfinalfiles) Also not really seeing how to program my way out of looking at these 2k files, I've got overlapping logic fatigue, but hooray, 2k files	[reply] [d/l]
Re^4: stitch together text file recovered by photorec/testdisk? by Anonymous Monk on Sep 10, 2018 at 21:50 UTC
idea: start with shortest smallest files (1-3k, half the files), see which bigger files they match	[reply]
Re^5: stitch together text file recovered by photorec/testdisk? by Anonymous Monk on Sep 11, 2018 at 02:53 UTC