Re^2: stitch together text file recovered by photorec/testdisk?

Most likely what you have are parts of "older" versions of that file. This isn't bad, as there is likely still a difference to you between an old version and nothing at all.

Luckily I included timestamps in files and they're mostly sequential ... grepping for the last timestamp it appears at most I've lost 4 hours

So this is eliminating more stuff I know for sure i already have

#!/usr/bin/perl --
use strict;
use warnings;
use Path::Tiny qw/ path /;
use File::Find::Rule qw/ find  /;
use autodie;
use Digest::MD5 qw( md5_hex );

my $qphotoreclog = 'qphotorec.log';

$qphotoreclog = path( $qphotoreclog  )->realpath;
chdir path( $qphotoreclog )->parent;


my $notit = '';
for my $unwanted ( find( file => maxdepth => 1 , in => 'D:/' ) ){
    next if  not -T $unwanted;
    $notit.=path( $unwanted )->slurp_raw;
}

my @files = sort glob 'myfinalfiles/*';
my @maybeit;
my @notit;
for my $file ( @files ){
    my $isit = path( $file )->slurp_raw;
    if( $notit =~ /\Q$isit\E/  ){
        push @notit, $file;
    } else {
        push @maybeit, $file;
    }
}
dd( 'notit', @notit );
dd( 'maybeit', @maybeit );
dd( 'files', int @files );
dd( 'notit', int @notit );
dd( 'maybeit', int @maybeit );

path('myfinalmaybeit')->mkpath;
for my $file ( @maybeit ){
    path( $file )->copy( 'myfinalmaybeit/');
}
__END__
...
("files", 6341)
("notit", 4294)
("maybeit", 2047)
18M     myfinalmaybeit
[download]

randomly viewing a few file, i found a file that seems to be a mix of wanted and unwanted, ugh

Comment on Re^2: stitch together text file recovered by photorec/testdisk? Download Code

Replies are listed 'Best First'.
Re^3: stitch together text file recovered by photorec/testdisk? by Anonymous Monk on Sep 10, 2018 at 21:45 UTC
randomly viewing a few file, i found a file that seems to be a mix of wanted and unwanted, ugh Grepping from the mixed file I found two more files that match that file , and with no mixing, `myfinalmaybeit\f67507544.txt is inside the longer file myfinalmaybeit\f9151816.txt and neither file has the unwanted mix from myfinalmaybeit\f93596792.txt` [download] Not seeing any matches already rejected (from myfinalfiles) Also not really seeing how to program my way out of looking at these 2k files, I've got overlapping logic fatigue, but hooray, 2k files	[reply] [d/l]
Re^4: stitch together text file recovered by photorec/testdisk? by Anonymous Monk on Sep 10, 2018 at 21:50 UTC
idea: start with shortest smallest files (1-3k, half the files), see which bigger files they match	[reply]
Re^5: stitch together text file recovered by photorec/testdisk? by Anonymous Monk on Sep 11, 2018 at 02:53 UTC
Hmm, starting with half yielded only ("copies", 74), checking all files against all files yields ("copies", 847) leaving only 1200 files / 6.54MB , thats less than the nulls 8MB, hmmm, have I eliminated data or actually lost it? ... use constant NAME => 0; use constant SIZE => 1; use constant VALU => 2; use constant COPY => 3; my @files = sort { $$a[SIZE()] <=> $$b[SIZE()] } map { [ $_, path($_)->stat->size, path($_)->slurp_raw, ]; } glob 'myfinalmaybeit/*'; my @half = @files; my $copies=0; for my $half ( @half ){ for my $file ( @files ){ if( $half != $file ## cause self will match self and $file->[VALU] =~ m{\Q$half->[2]\E} ){ $copies++; push @{ $half->[COPY] }, $file->[NAME]; } } } @files = sort { $$a[NAME()] cmp $$b[NAME()] } @files; $_->[VALU]=undef for @half, @files; dd( @files ); dd( 'copies', $copies ); path('myfinalmaybeit1k')->mkpath; for my $file ( @files ){ my $ref = $file->[ 3 ] ; if( not defined $ref or not @{ $ref } ){ path( $file->[0] )->copy('myfinalmaybeit1k/'); } } __END__ [download]	[reply] [d/l]