This "MSWord text recover" was written somenight when one my friend give to me HDD destroed (FAT tables) by virus and ask me about recovering text information from MSWord document on this HDD.
I'm working on Linux, so I'm don't mount this HDD at all (mount can't work without FAT tables :-) ) and run my script with /dev/hdb device as parameter.
Variables $START and $BLOCK must be corrected to fit your needs. ;-)
This script prints found text chains (English and Russian) to STDOUT.
#!/usr/bin/perl $pid = open FILTER,"|-"; die unless defined $pid; if ($pid) { open WORD, $ARGV[0] or die "Can't read $ARGV[0]\n"; $START = 000000000; $BLOCK = 100000000; sysseek WORD, $START, 0; while (sysread(WORD, $_, 10240)) { while ( /\G (.*?) (?: ((?:[\020-\117]\04)+) | ((?:[\10\12\040-\177]\00)+) ) /xgs ) { ($junk,$russian,$english) = ($1,$2,$3); $russian=~s/\04//g; $russian=~tr/\020-\117/\300-\377/; $english=~s/\00//g; print FILTER $russian,$english; print FILTER "\n" if length($junk) } last if sysseek(WORD,0,1)>$START+$BLOCK; } close FILTER; } else { while(<STDIN>) { print unless /^.{0,3}$/ || /(.)\1\1/; } }
WBR, Alex.