This narrows the list to 68.6 MB by elminating subsequent duplicates

#!/usr/bin/perl -- use strict; use warnings; use Path::Tiny qw/ path /; use File::Find::Rule qw/ find /; use autodie; use Digest::MD5 qw( md5_hex ); my $qphotoreclog = 'qphotorec.log'; $qphotoreclog = path( $qphotoreclog )->realpath; chdir path( $qphotoreclog )->parent; my $log = path( $qphotoreclog )->slurp_raw; my @files; my %seen; while( $log =~ m{^(.*?)[\r\n]*$}mg ){ my $line = $1; next if not $line =~ /recup_dir/; my( $filename, $blocks ) = split ' ', $line, 2; my $md5 = md5_hex( path( $filename )->slurp_raw ) ; push @{$seen{$md5}}, $filename; push @files, [ $filename, $blocks , $md5 , int@{$seen{$md5}} ]; } undef $log; # dd(\@files ); use constant FILENAME => 0; use constant SEEN => 3; print "Files before ", int @files, "\n"; @files = map { $_->[FILENAME] } grep { $_->[SEEN()] == 1 } @files; print "Files after ", int @files, "\n"; # dd(\@files ); path('myfinalrecup')->mkpath; for my $filename ( @files ){ path( $filename )->copy( 'myfinalrecup/' ); } __END__ Files before 40330 Files after 6341

Cant really see a relationship between the blocks and the filename , probably there isnt one

[ "recup_dir.3/f8580464.txt", "70013087-70013102", "2615e08f437222995c7aab0569f015f3", 1, [ "C:/undelet/testdisk-7.0.win/recup_dir.3/f8580480.txt", "70013103-70013110", "6fb0dd36db299c9b713d5c622bf5b499", 1, ], ... [ "recup_dir.3/f8583480.txt", "70016103-70016118", "2615e08f437222995c7aab0569f015f3", 2, ],

In reply to Re^2: stitch together text file recovered by photorec/testdisk? by Anonymous Monk
in thread stitch together text file recovered by photorec/testdisk? by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.