in reply to Re: stitch together text file recovered by photorec/testdisk?
in thread stitch together text file recovered by photorec/testdisk?

Most likely what you have are parts of "older" versions of that file. This isn't bad, as there is likely still a difference to you between an old version and nothing at all.

Luckily I included timestamps in files and they're mostly sequential ... grepping for the last timestamp it appears at most I've lost 4 hours

So this is eliminating more stuff I know for sure i already have

#!/usr/bin/perl -- use strict; use warnings; use Path::Tiny qw/ path /; use File::Find::Rule qw/ find /; use autodie; use Digest::MD5 qw( md5_hex ); my $qphotoreclog = 'qphotorec.log'; $qphotoreclog = path( $qphotoreclog )->realpath; chdir path( $qphotoreclog )->parent; my $notit = ''; for my $unwanted ( find( file => maxdepth => 1 , in => 'D:/' ) ){ next if not -T $unwanted; $notit.=path( $unwanted )->slurp_raw; } my @files = sort glob 'myfinalfiles/*'; my @maybeit; my @notit; for my $file ( @files ){ my $isit = path( $file )->slurp_raw; if( $notit =~ /\Q$isit\E/ ){ push @notit, $file; } else { push @maybeit, $file; } } dd( 'notit', @notit ); dd( 'maybeit', @maybeit ); dd( 'files', int @files ); dd( 'notit', int @notit ); dd( 'maybeit', int @maybeit ); path('myfinalmaybeit')->mkpath; for my $file ( @maybeit ){ path( $file )->copy( 'myfinalmaybeit/'); } __END__ ... ("files", 6341) ("notit", 4294) ("maybeit", 2047) 18M myfinalmaybeit

randomly viewing a few file, i found a file that seems to be a mix of wanted and unwanted, ugh

Replies are listed 'Best First'.
Re^3: stitch together text file recovered by photorec/testdisk?
by Anonymous Monk on Sep 10, 2018 at 21:45 UTC

    randomly viewing a few file, i found a file that seems to be a mix of wanted and unwanted, ugh

    Grepping from the mixed file I found two more files that match that file , and with no mixing,

    myfinalmaybeit\f67507544.txt is inside the longer file myfinalmaybeit\f9151816.txt and neither file has the unwanted mix from myfinalmaybeit\f93596792.txt

    Not seeing any matches already rejected (from myfinalfiles)

    Also not really seeing how to program my way out of looking at these 2k files, I've got overlapping logic fatigue, but hooray, 2k files

      idea: start with shortest smallest files (1-3k, half the files), see which bigger files they match
        Hmm, starting with half yielded only ("copies", 74), checking all files against all files yields ("copies", 847) leaving only 1200 files / 6.54MB , thats less than the nulls 8MB, hmmm, have I eliminated data or actually lost it?
        ... use constant NAME => 0; use constant SIZE => 1; use constant VALU => 2; use constant COPY => 3; my @files = sort { $$a[SIZE()] <=> $$b[SIZE()] } map { [ $_, path($_)->stat->size, path($_)->slurp_raw, ]; } glob 'myfinalmaybeit/*'; my @half = @files; my $copies=0; for my $half ( @half ){ for my $file ( @files ){ if( $half != $file ## cause self will match self and $file->[VALU] =~ m{\Q$half->[2]\E} ){ $copies++; push @{ $half->[COPY] }, $file->[NAME]; } } } @files = sort { $$a[NAME()] cmp $$b[NAME()] } @files; $_->[VALU]=undef for @half, @files; dd( @files ); dd( 'copies', $copies ); path('myfinalmaybeit1k')->mkpath; for my $file ( @files ){ my $ref = $file->[ 3 ] ; if( not defined $ref or not @{ $ref } ){ path( $file->[0] )->copy('myfinalmaybeit1k/'); } } __END__