Java Reference
In-Depth Information
Chapter 8
Working with Archive Files
In this chapter, afasasfyasdasdou will learn
•
What archisadasdve files are
•
What data compression is and how to compress and decompress data
•
How to compute checksum for data using different algorithms
•
How to create files in ZIP, GZIP, and JAR file formats and read data from them
jar
command-line tool to work with JAR files
•
How to use the
What Is an Archive File?
An archive file consists of one or more files. It also contains metadata that may include the directory structure of the
files, comments, error detection and recovery information, etc. An archive file may also be encrypted. Typically, but
not necessarily, an archive file is stored in a compressed format. An archive file is created using file archiver software.
For example, the WinZip, 7-zip, etc
.
utilities are used to create a file archive in a ZIP format on Microsoft Windows; the
tar
utility is used to create archive files on UNIX-based operating systems. An archive file makes it easier to store and
transmit multiple files as one file. This chapter discusses in detail how to work with archive files using the Java I/O API
and the
jar
command line utility that is included in the JDK.
Data Compression
Data compression is a process of applying an encoding algorithm to the given data to represent it in a smaller size.
Suppose you have a string,
777778888
. One way to encode it is
5748
, which can be interpreted as “five sevens and
four eights.” By this encoding, you have reduced the length of the string from nine to five characters. The algorithm
you have applied to compress
777778888
as
5748
is called
Run Length Encoding
(RLE). The RLE encodes the data by
replacing the repeated sequence of data by the counter number and one copy of data. The RLE is easy to implement.
It is suitable only in situations where you have more repeated data.
The reverse of data compression is called data decompression. Here, you apply an algorithm to the compressed
data to get back the original data.
There are two types of data compression: lossless and lossy. In lossless data compression, you get your original
data back when you decompress the compressed data. For example, if you decompress
5748
, you can get your original
data (
777778888)
back without losing any information. You can get the information back in this example because RLE
is a lossless data compression algorithm. Other lossless data compression algorithms are LZ77, LZ78, LZW, Huffman
coding, Dynamic Markov Compression (DMC), etc.