Superword level parallelism (SLP) is an advanced method of traditional vectorization that facilitates parallelism across loop iterations, SIMD, and basic blocks. Vectorization in this context is the process used to complete several similar tasks (or instructions) simultaneously, therefore saving processing time and resources.
Instruction-level parallelism (ILP), by comparison, measures the number of instructions (user or software-enabled requests) a computer can manage simultaneously for a specific process.
Unlike superword level parallelism, ILP cannot detect single instruction, multiple data (SIMD) or basic blocks. Modern computers almost universally contain the technology to process SIMD instructions using SLP, but it is most often visible when altering multimedia like images, audio, video, and 3D software.
Like vectorization, superword level parallelism is a method for achieving parallel processing. The proliferation of multimedia applications in devices has led to multimedia extensions on most existing microprocessors. With short SIMD instructions now commonplace on devices, instruction-level parallelism isn’t enough.
Developers can craft a compiler to detect superword level parallelism that targets blocks rather than loop nests, providing SIMD processing. In 2000, the Special Interest Group on Programming Languages tested SLP against standard vectorization and reported that SLP reduced dynamic instruction counts by 46%, and speedups ranged from 1.24 to 6.70.
Superword level parallelism is also referred to as a loop unrolling technique because it optimizes loop processing at the cost of its program code. Referred to as the space-time tradeoff, deploying SLP can dramatically reduce the time needed to complete instruction and requires more binary space. The heavier, space-consuming code can facilitate SLP, but the result could result in more instruction cache misses and reduced performance.