# Huffman compression

Also known as Huffman encoding, an algorithm for the lossless compression of files based on the frequency of occurrence of a symbol in the file that is being compressed. The Huffman algorithm is based on statistical coding, which means that the probability of a symbol has a direct bearing on the length of its representation. The more probable the occurrence of a symbol is, the shorter will be its bit-size representation. In any file, certain characters are used more than others. Using binary representation, the number of bits required to represent each character depends upon the number of characters that have to be represented. Using one bit we can represent two characters, i.e., 0 represents the first character and 1 represents the second character. Using two bits we can represent four characters, and so on.

Unlike ASCII code, which is a fixed-length code using seven bits per character, Huffman compression is a variable-length coding system that assigns smaller codes for more frequently used characters and larger codes for less frequently used characters in order to reduce the size of files being compressed and transferred.

For example, in a file with the following data:

XXXXXXYYYYZZ

the frequency of "X" is 6, the frequency of "Y" is 4, and the frequency of "Z" is 2. If each character is represented using a fixed-length code of two bits, then the number of bits required to store this file would be 24, i.e., (2 x 6) + (2x 4) + (2x 2) = 24.

If the above data were compressed using Huffman compression, the more frequently occurring numbers would be represented by smaller bits, such as:

X by the code 0 (1 bit)
Y by the code 10 (2 bits)
Z by the code 11 (2 bits)

therefore the size of the file becomes 18, i.e., (1x 6) + (2 x 4) + (2 x 2) = 18.

In the above example, more frequently occurring characters are assigned smaller codes, resulting in a smaller number of bits in the final compressed file.

Huffman compression was named after its discoverer, David Huffman.

Top Terms
• 1

## ERP module - Enterprise Resource Planning module

ERP (Enterprise Resource Planning) software consists of multiple enterprise software modules that are individually purchased.

• 2

## DevOps - development and operations

DevOps (development and operations) is an enterprise software development phrase used to mean a type of agile relationship between Development and...

• 3

## two tier enterprise resource planning (ERP)

Two-tier enterprise resource planning (ERP) is used by an organization to run two integrated ERP systems simultaneously. One system, the legacy...

## Connect with Webopedia

• ### VMware Virtual SAN Explained

Here's what you need to know about VMware's vSAN software-defined storage (SDS) offering.

Did You Know? Archive »

• ### Cloud Dictionary: 50 Cloud Computing Terms to Know

From planning a private cloud project to finding an online cloud storage provider, Webopedia's A-Z Cloud Computing Glossary will help you understa..

• ### 10 Essential Social Networking Tools to Save Time

These free social networking tools that can help you manage numerous social networking and bookmarking accounts, plus save you time by aggregating yo..

• ### 5 Open Source Tools for Web Developers

There's always something new in open source software. One trend is the sheer volume of projects surrounding JavaScript and Web development.

..

• ### Server Types

Different servers do different jobs, from serving email and video to protecting internal networks and hosting Web sites. Learn about the many type..