My cousin has the habbit of naming files in the most peculiar way. He uses unicode arabic character in the filenames, and his laptop runs an arabic version of Windows98.

The other day he came to me for a backup because his computer won't boot. So I booted it with a Virtual Linux CD, and transfered his files on the network.

The arabic files were a problem and I tared them together, but when the time came to extract the files, there were all named things like ????? ??? ????.doc, which were not allowed characters, and the tar couldn't be extracted.

digging around for a while, I hacked this script together to recover files which contain illegal filenames.

the script takes the Tar archive filename as an argument and generates the directory structure and files in the current directory substituting illegal names.

This is still in hackish format, I just added the bits of comments before posting it.

#!perl use Archive::Tar; my $t = Archive::Tar->new(); $t->read($ARGV[0]) or die "Must specify valid input file - $!\n"; # this regex can be changed to fit the platform specific # illegal characters to watch for. my $BAD_CHARACTERS = '[?]'; my %sig; my $i; foreach my $h ($t->data()) { # typeflag 5 is a folder. if ($h->{typeflag} == 5) { my @folders = split '/', $h->{name}; my @newf = (); foreach (@folders) { my $new; if (/$BAD_CHARACTERS/) { if ($sig{$_}) { # this happens if we step on a previously # recovered folder. $new = $sig{$_}; } else { $new = "_recovered_folder" . ++$i; $sig{$_} = $new; } } else { $new = $_; } push @newf, $new; } $sig{$h->{name}} = join '/', @newf; foreach (@newf) { mkdir $_; # actually there would be a lot of error +s here # had it been possible, # an mkdir -p $_ would have been better. chdir $_; } $out = '../' x @newf; chdir $out or die "I must have overdone myself. $!\n"; } else { my $name = $h->{name}; my ($nm) = (split '/', $name)[-1]; $new = $nm; if ($nm =~ /$BAD_CHARACTERS/) { my ($ext) = $nm =~ /\.(.*)$/; # pickup extensi +on. $new = "_recovered_file" . ++$i . ".$ext"; } my $folder = $name; $folder =~ s/\Q$nm\E//; $folder = $sig{$folder}; my @times = split '/', $folder; chdir $folder or die "where did that '$folder' go? $!\ +n"; open OUT, ">$new" or die "$!\n"; binmode (OUT); print OUT $h->{data}; close OUT; $myout = '../' x @times; chdir $myout or die "again? $!\n"; } }

He who asks will be a fool for five minutes, but he who doesn't ask will remain a fool for life.

Chady | http://chady.net/

In reply to Recover Tared archive with bad filenames by Chady

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.