An open-source MapReduce framework for distributed computing of large data sets on clusters of commodity hardware, featuring fault tolerance, automatic parallelization, and job monitoring.
Disco is an open-source MapReduce framework originally developed by Nokia for distributing the computing workloads of extremely large data sets across clusters of commodity hardware. It is designed to be scalable, fault-tolerant and easy to use.
Some key features of Disco MapReduce include:
Disco can handle very large data sets in the order of petabytes and scale to thousands of nodes. It has been used at Nokia for data-intensive processing use cases like clickstream analysis, data mining and machine learning.
Overall, Disco MapReduce provides a good open-source alternative to commercial solutions like Amazon EMR, with additional flexibility to run Disco on private cloud infrastructure.
Here are some alternatives to Disco MapReduce:
Suggest an alternative ❐