in reply to Problem with Archive::Tar created archives and Winzip

Jos broke Archive::Tar when he took it over and POSIXized it. And since he doesnt use Windows and considers Solaris a higher priority platform he refuses to change it back. Over a year ago I posted a patch to AT that would restore its previous gnu'ishness yet it was rejected because his friends told him it would break in their environments. Despite the absence of evidence actually demonstrating the problem he refused to apply the patch on the basis of his friends opinion. (This is all publically available for review if you dont believe me.)

The core problem is that there are several different flavours of Tar all of which are slightly incompatible with each other in the area of handling long file names.

The original tar spec support filenames and paths of up to around 100 chars (i forget the exact size of the field). When storing a path longer than this the original spec stipulated you would use the rightmost 100 chars. Then GNU invented a better way, this is that it uses a special tag as the filename that says the filename for the NEXT file is in the data for the special tag. This format has no arbitrary upper limit on the size of the filename and path.

Unfortunately however POSIX decided to adopt a different and much stupider scheme, that being to add a new 100 character field to store the path. So in this scheme the original field holds just the filename, and the new field holds the path. Of course this means that both filename and path are size restricted to 100 characters each with a combined total limit of 200 chars. It is this scheme that Jos took with Archive::Tar. Except he didn't do it right. POSIX actually recommends that you should use the extra path field ONLY when the combined filename and path is longer than 100 digits. This is to promote maximum portability with non POSIX compliant implementations (such as pretty much EVERY older tar our there). However Jos didnt do this. He just sticks the filename in the filename slot and the path into the path slot and is done.

Now Winzip doesnt know about the POSIX format (or didnt last time i checked), certainly older versions dont. So it just reads the filename, ignores the newfangled path field and flattens the entire archive down to a single directory.

My position is that Archive::Tar should be restored to its previous NON POSIX default and that those who really need POSIX formatted archives should have to stipulate so. Apparently because this would inconvenience the very small minority of Perl Solaris users Jos refuses to do so, preferring to inconvenience the thousands of Win32 user along with all those stuck on older machines with tar implementations that are not POSIX compatible.

If you check the bug reports for Archive::Tar you will see that many of the open bugs are related to this, and that Jos just doesnt care. You can also find my patch to Archive::Tar there which if you apply will make your Archive::Tar work properly again.

Jos has in the meantime told me he is willing to accept a patch that fixes some of the problem but that he is unwilling to de-POSIX it by default. Since i no longer have any need to manufacture Winzip compatible tar files and have lots of projects on my plate I havent had the time or inclination to do so. I welcome if somebody does. Alternatively somebody could use my patch to produce a Archive::Tar::Functional or something like that which would install into the Archive::Tar namespace and silently fix the matter.

Regardless I have to say this particular subject annoys me no end. Someone who takes over a core module and who breaks it on a major platform should do eveeything they can to get it working again, whatever their personal views are on what the defaults should be. Jos hasnt done this which IMO is a dereliction of the duty he undertook when he accepted maint of the module.

---
$world=~s/war/peace/g

  • Comment on Re: Problem with Archive::Tar created archives and Winzip

Replies are listed 'Best First'.
Re^2: Problem with Archive::Tar created archives and Winzip
by syphilis (Archbishop) on Jul 23, 2007 at 06:12 UTC
    Regardless I have to say this particular subject annoys me no end

    I must say that I was pretty amazed (and still am) to find that I can:
    $tar->read('orig.tar', 0); #read orig.tar into memory $tar->write('copy.tar, 0); #write copy.tar from memory
    and end up with a copy.tar that's not identical to orig.tar. (Of course, that doesn't happen if $Archive::Tar::DO_NOT_USE_PREFIX is set.)

    You can also find my patch to Archive::Tar

    Does the application of that patch achieve something that setting $Archive::Tar::DO_NOT_USE_PREFIX fails to achieve ?

    Cheers,
    Rob

      Afair the patch does the following:

      1. Uses the traditional single field for the name when the name of a file and its path will fit into the original 100 char name field. This is the safest option if its available as it means the tar file can be read even by very old tars that know neither the GNU nor the POSIX format. In other words it bypasses the whole POSIX/GNU debate entirely for a vast majority of use.
      2. Adds support for the GNU long file format which is currently unsupported by A::T. That is where a file with a long name is represented by two file records in the tar. The first record has a funky special name that tells tar that the name is in fact embedded in the (variable size) data portion, and the second has a similar label but the data portion has the file contents. This format allows filespecs of arbitrary size, not the braindamaged "we'll give you another 100 characters -- that should be enough" POSIX format.
      3. Changes the default long filename support format to GNU so that it will not produce POSIX file formats without being explicitly asked to. IME tar utilities that grok the GNU format are more common than ones that grok POSIX, although any new version of GNU tar will handle both correctly.

      In fact it is probably the first change that is the most important and useful. IMO its not that common to pack files whose packed path is longer than 100 chars, and as such its preferable to produce a tar which can be read by anything. As Larry has said: "be liberal with what you accept and conservative with what you produce". A::T should follow suit.

      ---
      $world=~s/war/peace/g