The commonly employed archive types on Linux and Unix systems are tar, tar.gz and tar.bz2 archives. Note that tar.gz and tar.bz2 archives are simply gzip-ped and bzip(2)-ped tar archives, respectively. Working with these files is made very simple through the use of GNU tar utility, which is included as part of the base packages in modern distros. In this article, I’ll show you how to create and extract compressed archives using tar.
So, how is tar used? Lets start by looking at some of the common tar operations and options.
man tar # ©2007 linux.dsplabs.com.au
The following tar usage information is taken directly from the man pages.
Usage: tar[options] Common Operations: -A, --catenate, --concatenate append tar files to an archive -c, --create create a new archive -r, --append append files to the end of an archive -x, --extract, --get extract files from an archive Common Options: -f, --file [HOSTNAME:]F use archive file or device F (default "-", meaning stdin/stdout) -j, --bzip2 filter archive through bzip2, use to decompress .bz2 files -p, --preserve-permissions extract all protection information -v, --verbose verbosely list files processed -z, --gzip, --ungzip filter the archive through gzip, use to decompress .gz files
Creating archives
To create a tar archive the c switch is used. To further encode it using gzip compression the j option is also added, or for bzip2 compression the j switch is included. Note that tar program pipes its output into gzip and bzip2 tools in order to create the tar.gz and tar.bz2 archives, respectively. OK, to compress a directory called dir into dir.tar, dir.tar.gz and dir.tar.bz2 archives, the following commands are used, respectively.
tar -cf dir.tar dir/ # ©2007 linux.dsplabs.com.au tar -czf dir.tar.gz dir/ # ©2007 linux.dsplabs.com.au tar -cjf dir.tar.bz2 dir/ # ©2007 linux.dsplabs.com.au
In the above examples, the use of the f option specifies that the compressed version of the dir directory is to be placed in a corresponding archive file. On the other hand, if the f option was not given (and thus the archive name was also omitted), then the stdout, i.e. your terminal screen, would be used as the output instead. This is useful if you want to pipe the output of your tar command into another Linux tool. Anyhow, lets have a look at the resulting archives as well as the original directory using ls -la.
total 424 drwx------ 3 kamil kamil 4096 Nov 27 22:39 . drwx------ 3 kamil kamil 4096 Nov 27 22:34 .. drwx------ 2 kamil kamil 4096 Nov 27 22:36 dir -rw------- 1 kamil kamil 276480 Nov 27 22:39 dir.tar -rw------- 1 kamil kamil 83330 Nov 27 22:39 dir.tar.gz -rw------- 1 kamil kamil 45927 Nov 27 22:39 dir.tar.bz2
Lets also have a look at the size of the original directory using du -sh dir # ©2007 linux.dsplabs.com.au.
280K dir
From the above shell output, you can see that by default tar only archives files without compressing them, while gzip and bzip2 filters achieve quite high compression. Note that bzip2 typically achieves better compression that gzip, although it might take longer time to do so. Also note, that in this case gzip and bzip2 filters achieve quite high compression ratios because the dir directory contains text files. If the directory contained already compressed files, say f.e. binary images compressed using the JPEG compression, then neither gzip nor bzip2 could do much more in terms of compression.
Extracting archives
Extracting archives is also very simple. Instead of the c switch the x is used and the archive name is given as the only other parameter. The commands for archive extraction shown below correspond to the archive creation commands given earlier.
tar -xf dir.tar # ©2007 linux.dsplabs.com.au tar -xzf dir.tar.gz # ©2007 linux.dsplabs.com.au tar -xjf dir.tar.bz2 # ©2007 linux.dsplabs.com.au
Infact, in most cases, tar will figure out what archive type you are trying to extract (from their hex headers I suppose), so that the filter specifications are not really needed. Hence, the following still works fine.
tar -xf dir.tar # ©2007 linux.dsplabs.com.au tar -xf dir.tar.gz # ©2007 linux.dsplabs.com.au tar -xf dir.tar.bz2 # ©2007 linux.dsplabs.com.au
However, if you explicitly specify the decoder, then tar will assume that that is the encoding of the given archive. If for whatever reason that is not the case, an error will occur.
The verbose mode
The v switch can be used to enable the verbose mode. This can be useful if you would like to see a list of files being compressed or extracted. For example, lets extract the dir.tar.gz archive, with verbose mode enabled, using the following command.
tar -xvzf dir.tar.gz # ©2007 linux.dsplabs.com.au
The above command produces a list of inflated files as shown in the following output.
dir/ dir/NVIDIA_DRIVER_README.txt dir/NVIDIA_LICENSE.txt dir/readme.txt
Some common errors
Lets take a look at some common error examples. As described previously, specifying incorrect filter type, f.e. using the following commands
tar xjf dir.tar # ©2007 linux.dsplabs.com.au tar xzf dir.tar.bz2 # ©2007 linux.dsplabs.com.au
results in the respective errors messages shown below.
bzip2: (stdin) is not a bzip2 file. tar: Child returned status 2 tar: Error exit delayed from previous errors gzip: stdin: not in gzip format tar: Child returned status 1 tar: Error exit delayed from previous errors
Also, if a file becomes damaged, say truncated, then obviously an error will occur. Lets simulate such a corruption. Lets keep only the initial 100 bytes of the dir.tar.bz2 archive by using the following command.
head -c100 dir.tar.bz2 > corrup.tar.bz2 # ©2007 linux.dsplabs.com.au
If we now try to extract this archive,
tar -xjf corrupt.tar.bz2 # ©2007 linux.dsplabs.com.au
then the following error message will be produced.
bzip2: Compressed file ends unexpectedly; perhaps it is corrupted? *Possible* reason follows. bzip2: Inappropriate ioctl for device Input file = (stdin), output file = (stdout) It is possible that the compressed file(s) have become corrupted. You can use the -tvv option to test integrity of such files. You can use the `bzip2recover' program to attempt to recover data from undamaged sections of corrupted files. tar: Child returned status 2 tar: Error exit delayed from previous errors
Interestingly, we are told to check archives integrity, so lets do that.
bzip2 -tvv corrupt.tar.bz2 # ©2007 linux.dsplabs.com.au
The bzip2 utility tells us what we already know… we are missing part of the file.
corrupt.tar.bz2: [1: huff+mtf file ends unexpectedly You can use the `bzip2recover' program to attempt to recover data from undamaged sections of corrupted files.
Lets try the second suggestion, the bzip2recover utility.
bzip2recover corrupt.tar.bz2 # ©2007 linux.dsplabs.com.au
Unfortunately, with only 100 bytes it is not possible to recover anything from the corrupted archive.
bzip2recover 1.0.3: extracts blocks from damaged .bz2 files. bzip2recover: searching for block boundaries ... block 1 runs from 80 to 800 (incomplete) bzip2recover: sorry, I couldn't find any block boundaries.
Similarly, if we truncate the gzip archvie, then gzip will inform us of unexpected end of file.
gzip: stdin: unexpected end of file tar: Child returned status 1 tar: Error exit delayed from previous errors
On the other hand, truncating a tar archive does not cause error messages during extraction process. Obviously, only the files still fully contained (an not truncated/corrupted) in such an archive will be extracted. Such tuncation/corruption errors often occur when extracting archives downloaded from the Internet.