Huffman compressionAlso known as Huffman encoding, an algorithm for the lossless compression of files based on the frequency of occurrence of a symbol in the file that is being compressed. The Huffman algorithm is based on statistical coding, which means that the probability of a symbol has a direct bearing on the length of its representation. The more probable the occurrence of a symbol is, the shorter will be its bit-size representation. In any file, certain characters are used more than others. Using binary representation, the number of bits required to represent each character depends upon the number of characters that have to be represented. Using one bit we can represent two characters, i.e., 0 represents the first character and 1 represents the second character. Using two bits we can represent four characters, and so on.
Unlike ASCII code, which is a fixed-length code using seven bits per character, Huffman compression is a variable-length coding system that assigns smaller codes for more frequently used characters and larger codes for less frequently used characters in order to reduce the size of files being compressed and transferred.
For example, in a file with the following data:
the frequency of "X" is 6, the frequency of "Y" is 4, and the frequency of "Z" is 2. If each character is represented using a fixed-length code of two bits, then the number of bits required to store this file would be 24, i.e., (2 x 6) + (2x 4) + (2x 2) = 24.
If the above data were compressed using Huffman compression, the more frequently occurring numbers would be represented by smaller bits, such as:
X by the code 0 (1 bit)
Y by the code 10 (2 bits)
Z by the code 11 (2 bits)
therefore the size of the file becomes 18, i.e., (1x 6) + (2 x 4) + (2 x 2) = 18.
In the above example, more frequently occurring characters are assigned smaller codes, resulting in a smaller number of bits in the final compressed file.
Huffman compression was named after its discoverer, David Huffman.
- Check out eWeek's new Research Center, a central and comprehensive library of whitepapers, eBooks, eseminars, webcasts, and more from top industry brands and independent tech journalists »
- Watch Datamation's editor James Maguire moderate roundtable discussions with tech experts from companies such as Accenture, Dell, Blue Jeans Network, Microsoft and more »
If hackers get their hands on your company's data, they can wreak havoc on customer relationships and cause tremendous damage to your brand and... Read More »Windows XP: Move Along, There's Nothing to See Here
After more than 12 years of holding the title of most popular operating system in the world, Windows XP is taking center stage for its final... Read More »Report: The Role of Big Data in the Marketing Industry
According to a new study from Infogroup Targeting Solutions, we can expect to see companies spend heavily on big data marketing initiatives in... Read More »
Although it is almost impossible to keep up with the pace of ongoing product releases, here are three recent highlights in the flash data storage... Read More »15 Important Big Data Facts for IT Professionals
Keeping track of big data trends, research and statistics gives IT professionals a solid foundation to plan big data projects. Here are 15... Read More »Enterprise Storage Vendors
There's a number of vendors that sell enterprise storage hardware or offer cloud-based enterprise storage. View Webopedia's Enterprise storage... Read More »