in reply to stitch together text file recovered by photorec/testdisk?

Most likely what you have are parts of "older" versions of that file. This isn't bad, as there is likely still a difference to you between an old version and nothing at all.

My approach to attempt to reconstruct the file would be to do it in several steps:

  1. Read the "restored" file containing the nulls
  2. Read the partial files
  3. Eliminate all partial files that occur completely in the good parts of the restored file
  4. Try to find a partial overlap of the end of a readable part of the restored file with at least one partial file
  5. Repeat with the concatenation of the good file and the partial file until you've exhausted all partial files
  6. If you find multiple partial files that match, flag those for manual user review. Maybe the longest overlap is better, or maybe the shortest overlap is better.

That should give you one potential version of your file, with fewer missing parts than before.

You could also try the same with your partial files, and/or try to find the overlaps between different partial files to piece those together.

  • Comment on Re: stitch together text file recovered by photorec/testdisk?

Replies are listed 'Best First'.
Re^2: stitch together text file recovered by photorec/testdisk?
by Anonymous Monk on Sep 10, 2018 at 19:44 UTC

    This narrows the list to 68.6 MB by elminating subsequent duplicates

    #!/usr/bin/perl -- use strict; use warnings; use Path::Tiny qw/ path /; use File::Find::Rule qw/ find /; use autodie; use Digest::MD5 qw( md5_hex ); my $qphotoreclog = 'qphotorec.log'; $qphotoreclog = path( $qphotoreclog )->realpath; chdir path( $qphotoreclog )->parent; my $log = path( $qphotoreclog )->slurp_raw; my @files; my %seen; while( $log =~ m{^(.*?)[\r\n]*$}mg ){ my $line = $1; next if not $line =~ /recup_dir/; my( $filename, $blocks ) = split ' ', $line, 2; my $md5 = md5_hex( path( $filename )->slurp_raw ) ; push @{$seen{$md5}}, $filename; push @files, [ $filename, $blocks , $md5 , int@{$seen{$md5}} ]; } undef $log; # dd(\@files ); use constant FILENAME => 0; use constant SEEN => 3; print "Files before ", int @files, "\n"; @files = map { $_->[FILENAME] } grep { $_->[SEEN()] == 1 } @files; print "Files after ", int @files, "\n"; # dd(\@files ); path('myfinalrecup')->mkpath; for my $filename ( @files ){ path( $filename )->copy( 'myfinalrecup/' ); } __END__ Files before 40330 Files after 6341

    Cant really see a relationship between the blocks and the filename , probably there isnt one

    [ "recup_dir.3/f8580464.txt", "70013087-70013102", "2615e08f437222995c7aab0569f015f3", 1, [ "C:/undelet/testdisk-7.0.win/recup_dir.3/f8580480.txt", "70013103-70013110", "6fb0dd36db299c9b713d5c622bf5b499", 1, ], ... [ "recup_dir.3/f8583480.txt", "70016103-70016118", "2615e08f437222995c7aab0569f015f3", 2, ],
Re^2: stitch together text file recovered by photorec/testdisk?
by Anonymous Monk on Sep 10, 2018 at 21:30 UTC

    Most likely what you have are parts of "older" versions of that file. This isn't bad, as there is likely still a difference to you between an old version and nothing at all.

    Luckily I included timestamps in files and they're mostly sequential ... grepping for the last timestamp it appears at most I've lost 4 hours

    So this is eliminating more stuff I know for sure i already have

    #!/usr/bin/perl -- use strict; use warnings; use Path::Tiny qw/ path /; use File::Find::Rule qw/ find /; use autodie; use Digest::MD5 qw( md5_hex ); my $qphotoreclog = 'qphotorec.log'; $qphotoreclog = path( $qphotoreclog )->realpath; chdir path( $qphotoreclog )->parent; my $notit = ''; for my $unwanted ( find( file => maxdepth => 1 , in => 'D:/' ) ){ next if not -T $unwanted; $notit.=path( $unwanted )->slurp_raw; } my @files = sort glob 'myfinalfiles/*'; my @maybeit; my @notit; for my $file ( @files ){ my $isit = path( $file )->slurp_raw; if( $notit =~ /\Q$isit\E/ ){ push @notit, $file; } else { push @maybeit, $file; } } dd( 'notit', @notit ); dd( 'maybeit', @maybeit ); dd( 'files', int @files ); dd( 'notit', int @notit ); dd( 'maybeit', int @maybeit ); path('myfinalmaybeit')->mkpath; for my $file ( @maybeit ){ path( $file )->copy( 'myfinalmaybeit/'); } __END__ ... ("files", 6341) ("notit", 4294) ("maybeit", 2047) 18M myfinalmaybeit

    randomly viewing a few file, i found a file that seems to be a mix of wanted and unwanted, ugh

      randomly viewing a few file, i found a file that seems to be a mix of wanted and unwanted, ugh

      Grepping from the mixed file I found two more files that match that file , and with no mixing,

      myfinalmaybeit\f67507544.txt is inside the longer file myfinalmaybeit\f9151816.txt and neither file has the unwanted mix from myfinalmaybeit\f93596792.txt

      Not seeing any matches already rejected (from myfinalfiles)

      Also not really seeing how to program my way out of looking at these 2k files, I've got overlapping logic fatigue, but hooray, 2k files

        idea: start with shortest smallest files (1-3k, half the files), see which bigger files they match