Apache Spark

Apache Spark is an open-source engine developed specifically for handling large-scale data processing and analytics. Spark offers the ability to access data in a variety of sources, including Hadoop Distributed File System (HDFS), OpenStack Swift, Amazon S3 and Cassandra.

Apache Spark is designed to accelerate analytics on Hadoop while providing a complete suite of complementary tools that include a fully-featured machine learning library (MLlib), a graph processing engine (GraphX) and stream processing.

Apache Spark originated at UC Berkeley s AMPLab in 2009 and was donated in 2013 to the Apache Software Foundation, where it has become the most active project in terms of contributions.

One of the key reasons behind Apache Spark s popularity, both with developers and in enterprises, is its speed and efficiency. Spark runs programs in memory up to 100 times faster than Hadoop MapReduce and up to 10 times faster on disk. Spark is natively designed to run in-memory, enabling it to support iterative analysis and more rapid, less expensive data crunching.

Forrest Stroud
Forrest Stroud
Forrest is a writer for Webopedia. Experienced, entrepreneurial, and well-rounded, he has 15+ years covering technology, business software, website design, programming, and more.
Get the Free Newsletter
Subscribe to Daily Tech Insider for top news, trends & analysis
This email address is invalid.
Get the Free Newsletter
Subscribe to Daily Tech Insider for top news, trends & analysis
This email address is invalid.

Related Articles

Embedded Analytics

Embedded analytics brings self-service business intelligence to everyday application users.

HRIS

Human resources information system (HRIS) solutions help businesses manage multiple facets of their workforce operations. They provide a central platform for human resources professionals...

Complete List of Cybersecurity Acronyms

Cybersecurity news and best practices are full of acronyms and abbreviations. Without understanding what each one means, it's difficult to comprehend the significance of...

Human Resources Management System

A Human Resources Management System (HRMS) is a software application that supports many functions of a company's Human Resources department, including benefits administration, payroll,...

ScalaHosting

ScalaHosting is a leading managed hosting provider that offers secure, scalable, and affordable...

HRIS

Human resources information system (HRIS) solutions help businesses manage multiple facets of their...

Best Managed Service Providers...

In today's business world, managed services are more critical than ever. They can...