I have compared several tar implementations with the standard.
(IEEE/Posix1003/IEC-9945-1 Standard Data Interchange format)
Although the POSIX.1-1988 standard now also defines cpio as an exchange format,
I cannot recommand the cpio archive format for data exchange. There are at
least 6 totally incompatible archive formats - all covered by the name "cpio".
It is most likely, that you are using an archive format that other cpio
implementations will not understand at all.
Note that POSIX.1-2001 will drop the cpio format from the standard as it
is not extendible (e.g. for large files > 8 GB and UID's > 2097151).
Tar in general will at least extract most of the files if you are using a
different implementation to extract the archive.
I've had a look at the following implementations:
Index: Program description: Source of program:
===== ==================== ==================
1) bsd 4.3 tar (Regents of UCB)
2) pax / ustar on SunOS 4.1 (USENIX)
3) tar on Solaris 2.3/2.4/2.5 (Sun/AT&T ??)
4) gnutar 1.11.8 (gnu)
5) gnucpio 2.3 (gnu)
Summary:
1) bsd 4.3 tar
Pre Posix 1003.1
- Miscomputes the checksum. Therefore it is not able to extract
standard conforming tar archives if they contain 8bit chars
in the filename. This is a common bug found in many other
implementations too.
No additional problems on portability except with gnutar
archives. But this is not a problem of BSD tar.
2) pax / ustar (found on SunOS-4.x)
- Dumps core on every even/odd use.
- Computes checksums only on the first 500 bytes of the
tar header: not conforming to Posix 1003.1 standard.
Note: This claims to be a reference implementation for
the Posix 1003.1 standard!
3) tar distributed with Solaris 2.3/2.4/2.5
- Transfers more than 12 Bit from stat.st_mode (violating Posix)
- Complains about "impossible file type" when reading
tar archives which do not contain these illegal upper bits.
This problem is still present in Solaris 7 & Solaris 8
- Does not handle non null terminated filenames correctly.
The standard allows filenames that are exactly 100 chars
and therefore are not null terminated. (Fixed in Solaris 2.5)
For the above reasons, Sun's tar is not conforming to
Posix 1003.1.
- Loops infinitely when trying to dump /dev/fd.
Caused by incorrect handling of nested directories (assumes
all directories seekable).
This makes it impossible to use Solaris tar on the root file
system.
4) gnutar
Claims not to be conforming to Posix 1003.1. (gnu is not tar)
- Many bugs in implementation and design.
(e.g. when handling/creating multi volume archives)
- The second logical EOF block in GNU-tar archives is missing
with a 50% chance.
This will cause correctly working tar implementations to
complain about an illegal/missing EOF in the tar archive.
This bug seems to be fixed with newer 1.13 releases
- Deeply nested directory trees will not be dumped:
Error message is: Too many open files
(This is a similar implementation bug as found in Solaris tar
with the /dev/fd loop) caused by the fact that GNU-tar
assumes infinite resources in file descriptors.
- Hard links with long names to files with long names do not
work. This bug seems to be fixed with newer 1.13 releases.
- GNU-tar cannot read Posix compliant tar archives with
long file names if the filename prefix it at least
138 characters. GNU-tar will think that it found an extended
sparse GNU tar archive and gets out of sync for the rest of
the archive.
See --sparse design bug description below.
This bug seems to be partially fixed with newer 1.13 releases
Even GNU-tar-1.13.19 does not seem to evaluate USTAR magic
and version to distinguish between a POSIX tar archive and a
non-standard GNU-tar archive.
- GNU-tar even has a not yet identified bug which causes GNUtar
not to be able to partially read star archives if these
archives are not created with star -Hustar
May be this is caused by aspects of the topic above.
- Option --sparse produces archives which cannot be read by any
other tar implementation known to me (except star), because
they will get "out of sync".
Posix 1003.1 conforming tar archives let gnutar get
"out of sync" even if the --portability option is used (see
above). This is a severe design bug in GNU-tar.
Description:
The size field in a tar archive cannot reflect the
real size of a sparse file to have compatibility to
other implementations (this is also true for "star"
archives but star archives use a value in the size
field that is understood by other tar implementations).
If the "sparse" file contains more than 4 holes,
the "size" field in the GNU-tar control block does not
reflect the total size of the (shrunk) sparse file in
the archive because it does not count the 'sparse'
extension headers. Posix conformant archives that use
the name prefix field with more than 137 characters
will have a value != 0 on a field that that makes
gnutar believe that such an extension header is
present - GNU-tar will get out of sync.
Note: The general rule for any tar is that it should
be able to read any "tar" compliant data stream with
the exception that enhancements to the standard
only will fail on the files that contain the extension.
Those files should be extracted as if they were
regular files.
- When GNU-tar writes archives it is not able to write long
filenames correctly according to POSIX.1-1988 or to
POSIX.1-2001. As GNU-tar uses a non-standard extension to
handle filenames > 100 chars, GNU-tar is a frequent problem
of the portability of archives. Is is not uncommon that the
length of filenames exceeds 100 chars, while > 99% of the
long filenames do not exceed ~ 230 chars. So most of the
long filenames may be handled by the POSIX.1-1988 method
which has been first documented in the 1987 draft of the
POSIX.1 standard. I strongly recommend not to use GNU-tar
to create archives for source exchange for this reason.
It is bad to see that now (in 2001), 11 years after the
POSIX.1-1988 standard has become accepted, GNU-tar still does
not conform to this POSIX standard. Even worse: the first
draft of the POSIX.1-1988 standard that did not deviate from
the final in important things, appeared in autumn 1987. This
is about the first time when PD-tar which was the base for
GNU-tar appeared. PD-tar (in 1987) _did_ follow the POSIX.1
standard with one single exception: it did not implement long
filenames (filenames > 100 chars) at all. The non-standard GNU
method of handling long filenames has been introduced in 1989
by people from FSF. At this time, GNU-tar did not yet use the
POSIX.1 filename prefix for other non-POSIX purposes, so there
is no excuse for the non-standard way that FSF went. Don't
believe the false GNU-tar history from FSF. I send a correct
GNU-tar history to FSF in 1994, FSF still has to correct their
false claims about GNU-tar history.
See also http://www.geocrawler.com/archives/3/92/1997/2/0/2217471/
as a proof that a previous GNU tar maintainer did admid the
wrong design done by FSF members in the past.
Summary: The main problem with GNU-tar, when it is reading TAR
archives, is that assumes all tar archives to be non-standard
GNU-tar archives. It does not implement a TAR format detection
based on the actual header format (as found in star) in total.
Instead, it seems to have peep-hole based decisions on how to
interpret parts of the TAR haeder. This can never work
correctly.
Note: I do not recommend GNU tar as an exchange format.
Use star -Hustar for maximum portability instead.
If you like to write archives compliant to POSIX-1.2001
use star -Hexustar to create archives with extended POSIX
headers.
5) gnucpio
- Splits long filenames at the leftmost '/' instead of the
rightmost position of '/' required by my copy of the
Posix standard.
- The docs claim compatibility with gnutar.
But extraction of gnutar archives containing 'atime' gives
funny filenames! (try this ...)
- Octal numbers are left padded with ' ' instead of '0'.
The mode field contains more than the lower 12 bits from
stat.st_mode.