in reply to Re^2: stitch together text file recovered by photorec/testdisk?
in thread stitch together text file recovered by photorec/testdisk?

randomly viewing a few file, i found a file that seems to be a mix of wanted and unwanted, ugh

Grepping from the mixed file I found two more files that match that file , and with no mixing,

myfinalmaybeit\f67507544.txt is inside the longer file myfinalmaybeit\f9151816.txt and neither file has the unwanted mix from myfinalmaybeit\f93596792.txt

Not seeing any matches already rejected (from myfinalfiles)

Also not really seeing how to program my way out of looking at these 2k files, I've got overlapping logic fatigue, but hooray, 2k files

Replies are listed 'Best First'.
Re^4: stitch together text file recovered by photorec/testdisk?
by Anonymous Monk on Sep 10, 2018 at 21:50 UTC
    idea: start with shortest smallest files (1-3k, half the files), see which bigger files they match
      Hmm, starting with half yielded only ("copies", 74), checking all files against all files yields ("copies", 847) leaving only 1200 files / 6.54MB , thats less than the nulls 8MB, hmmm, have I eliminated data or actually lost it?
      ... use constant NAME => 0; use constant SIZE => 1; use constant VALU => 2; use constant COPY => 3; my @files = sort { $$a[SIZE()] <=> $$b[SIZE()] } map { [ $_, path($_)->stat->size, path($_)->slurp_raw, ]; } glob 'myfinalmaybeit/*'; my @half = @files; my $copies=0; for my $half ( @half ){ for my $file ( @files ){ if( $half != $file ## cause self will match self and $file->[VALU] =~ m{\Q$half->[2]\E} ){ $copies++; push @{ $half->[COPY] }, $file->[NAME]; } } } @files = sort { $$a[NAME()] cmp $$b[NAME()] } @files; $_->[VALU]=undef for @half, @files; dd( @files ); dd( 'copies', $copies ); path('myfinalmaybeit1k')->mkpath; for my $file ( @files ){ my $ref = $file->[ 3 ] ; if( not defined $ref or not @{ $ref } ){ path( $file->[0] )->copy('myfinalmaybeit1k/'); } } __END__