Hadoop, formally called Apache Hadoop, is an Apache Software Foundation project and open source software platform for scalable, distributed computing. Hadoop can provide fast and reliable analysis of both structured data and unstructured data. Given its capabilities to handle large data sets, it’s often associated with the phrase big data.
Recommended Reading: What is Open Source software?
The Apache Hadoop software library is essentially a framework that allows for the distributed processing of large datasets across clusters of computers using a simple programming model. Hadoop can scale up from single servers to thousands of machines, each offering local computation and storage.
The Apache Hadoop software library can detect and handle failures at the application layer, so it can deliver a highly-available service on top of a cluster of computers, each of which may be prone to failures.
As listed on the The Apache Hadoop project Web site, the Apache Hadoop project includes a number of subprojects, including: