Cloud 2 min read

Apache Hadoop

Last Updated May 24, 2021 1:44 pm

Hadoop, formally called Apache Hadoop, is an Apache Software Foundation project and open source software platform for scalable, distributed computing. Hadoop can provide fast and reliable analysis of both structured data and unstructured data. Given its capabilities to handle large data sets, it’s often associated with the phrase big data.

Recommended Reading: What is Open Source software?

The Apache Hadoop software library is essentially a framework that allows for the distributed processing of large datasets across clusters of computers using a simple programming model. Hadoop can scale up from single servers to thousands of machines, each offering local computation and storage.

The Apache Hadoop software library can detect and handle failures at the application layer, so it can deliver a highly-available service on top of a cluster of computers, each of which may be prone to failures.

As listed on the The Apache Hadoop project Web site, the Apache Hadoop project includes a number of subprojects, including:

Hadoop Common: The common utilities that support the other Hadoop subprojects.
Hadoop Distributed File System (HDFS): A distributed file system that provides high-throughput access to application data.
Hadoop MapReduce: A software framework for distributed processing of large data sets on compute clusters.
Hadoop YARN: A framework for job scheduling and cluster resource management.

1. What is Apache Hadoop?
2. What is Hadoop MapReduce?
3. What is HortonWorks?
4. What is Hadoop Distributed File System?
5. What is unstructured data?

Was this Article helpful? Yes No

Thank you for your feedback. 0% 0%

Apache Hadoop

Top 5 Hadoop Related Questions