Short for redundant array of independent disks, RAID is a storage device that uses multiple disks to provide fault tolerance, improve overall performance, and increase storage capacity in a computer system. Contrasted with other older storage devices, RAID allows users to store the same data across multiple disks, effectively reducing costs and improving overall performance.
History of RAID
In 1988, three University of California, Berkeley researchers David Patterson, Garth A. Gibson, and Randy Katz published a paper titled “A Case for Redundant Arrays of Inexpensive Disks (RAID)” in which they first defined the term RAID. They theorized that spreading data across multiple drives could improve system performance, lower costs, and reduce power consumption while avoiding the potential reliability problems inherent in using inexpensive and less reliable disks. The paper also outlined the original five RAID levels, which had been used prior to the paper’s publication but never formally identified in relation to one another.
Before RAID, most systems relied on an arrangement that used a single drive to store data, called single large expensive disk (SLED). These configurations were somewhat unreliable because of their tendency to create input/output (I/O) bottlenecks. This was a result of their inability to process data as quickly as other system components (i.e., the server) demanded. The single disk also meant that users needed to regularly back up the data to a secondary storage device or risk losing everything in the event of SLED failure.
At one time, RAID technology was nearly ubiquitous among both enterprise storage devices and many high-capacity consumer grade equipment. Some emerging (or re-emerging) technologies like solid state drives (SSDs), concatenation, erasure coding, Just a Bunch of Disks (JBOD), and hyperscale computing fill the holes created by many RAID arrays, so some analysts have predicted the decline of RAID technology despite its widespread implementation today.
The Storage Networking Industry Association has established the Common RAID Disk Data Format (DDF) specification to define how data should be distributed across disks in a RAID device. This regulation was instituted to promote interoperability among various RAID vendors.
How RAID works
A typical RAID array uses multiple disks that the server operating system views as a single device so it can provide more storage capacity than a single disk. Sequential data is either duplicated or distributed across the disk array so that the data (either in part or whole) is preserved if one disk fails. The specific RAID scheme depends on the priority at hand, be it reliability, availability, performance, or scalability. RAID schemes are also called levels.
RAID offers the option of reading or writing to more than one disk at the same time either by mirroring or striping in order to improve performance. With disk striping, the data is distributed across disks and not duplicated. Some of the data may be lost if one disk fails, but unaffected disks will maintain their allocated data. Conversely, data mirroring duplicates data across multiple disks, so an exact copy can be retrieved if on disk fails.
A RAID array can be pre-built into hardware by a storage vendor, set up using a RAID controller, or configured with software RAID and server resources. The first option is generally the most expensive but also the most powerful; they typically include two RAID controllers and a group of disks in their own housing, so they’re ideal for enterprise storage solutions. Software RAID, on the other hand, is usually more cost-effective but relies on the system’s CPU and can slow down other applications.
Standard RAID Levels
RAID devices use many different architectures, called levels, depending on the desired balance between performance and fault tolerance. RAID levels describe how data is distributed across the drives. Standard RAID levels include the following:
- Level 0 (striped disk array without fault tolerance): Level 0 provides data striping but no redundancy. This improves performance but does not deliver fault tolerance, meaning all data is lost if one drive fails.
- Level 1 (mirroring and duplexing): Level 1 provides disk mirroring. It offers twice the read transaction rate and the same write transaction rate of single disks.
- Level 2 (error-correcting coding): Not a typical implementation and rarely used, Level 2 stripes data at the bit level rather than the block level.
- Level 3 (bit-interleaved parity): Level 3 provides byte-level striping with a dedicated parity disk. It cannot service simultaneous multiple requests, so it also is rarely used.
- Level 4 (dedicated parity drive): A commonly used implementation of RAID, Level 4 provides block-level striping (like Level 0) with a parity disk. If a data disk fails, the parity data is used to create a replacement disk. A disadvantage to Level 4 is that the parity disk can create write bottlenecks.
- Level 5 (block interleaved distributed parity): Level 5 provides data striping at the byte level and also stripe error correction information. This results in excellent performance and good fault tolerance. Level 5 is one of the most popular implementations of RAID and is patented by IBM.
- Level 6 (independent data disks with double parity): Level 6 provides block-level striping with parity data distributed across all disks.
- Level 10 (a stripe of mirrors): Level 10 creates multiple RAID 1 mirrors and an umbrella RAID 0 stripe.
Non-Standard RAID Levels
Some devices use more than one level in a hybrid or nested arrangement, and some vendors also offer non-standard proprietary RAID levels. Examples of non-standard RAID levels include the following:
- Level 0+1 (a mirror of stripes): In this level, two RAID 0 stripes and an umbrella RAID 1 mirror are created. Level 0+1 is used for both replicating and sharing data among disks.
- Level 7: Level 7 is a trademark of Storage Computer Corporation that adds caching to Levels 3 or 4.
- RAID 1E: RAID 1E is a RAID 1 implementation with more than two disks. Data striping is combined with mirroring each written stripe to one of the remaining disks in the array.
- RAID S: Also called Parity RAID, RAID S is EMC Corporation’s proprietary striped parity RAID system used in its Symmetrix storage systems.