Azkaban is an open source workflow scheduler created at LinkedIn to run Hadoop jobs. It allows users to easily create, schedule and monitor workflows made up of different jobs. Azkaban provides a web interface and scheduling capabilities to manage dependencies between jobs.
A workflow scheduler for managing Hadoop jobs, allowing users to create, schedule and monitor workflows with web interface and scheduling capabilities.
What is Azkaban?
Azkaban is an open source batch workflow job scheduler created at LinkedIn in 2012. It is used to schedule and run Hadoop jobs, manage dependencies between jobs and prevent jobs from failing or running simultaneously. Azkaban provides an easy to use web user interface to create and schedule workflows and provides capabilities to monitor running workflows.
Key features of Azkaban include:
Web-based user interface to upload jobs, build workflows and set schedules
Workflow definition language to easily build dependencies between jobs
Schedule workflows to run at particular times or dates
Alerts and notifications when workflows fail or complete
Monitor running workflows on a visual graph
Role based access control to manage users
Track history and stats of previously run workflows
Azkaban is written in Java and can be configured to run on a single machine or a Hadoop cluster. It is widely used by companies to schedule recurring ETL, analysis and machine learning jobs. The automated scheduling helps improve efficiency and prevents manual errors.
Azkaban Features
Features
Web-based workflow scheduler
Allows creating, managing and monitoring workflows
RunDeck is an open source automation server used to run jobs, processes, and workflows across multiple machines. It allows you to schedule all kinds of tasks, including:Ad hoc scriptsSystem administrationBig data workflowsKey features include:Job scheduling and dispatchResource modeling (create an inventory of nodes)Role-based access controlIntegrations (SSH, LDAP, Active Directory)Remote execution...
Apache Airflow is an open-source workflow management platform created by Airbnb in 2015. It is used to programmatically author, schedule and monitor workflows. Airflow provides a graphical interface to visualize pipelines, dependencies between tasks, and monitor the workflow.Some key features and benefits of Apache Airflow include:Directed Acyclic Graphs (DAGs) -...
Zenaton is an open-source workflow orchestration platform that allows developers to code any complex business process in code. It handles asynchronous tasks, priorities, scheduling, errors and more out-of-the-box allowing developers to focus on implementing the business logic rather than building custom workflow engines.Key features of Zenaton include:Model workflows in code...
Luigi is an open source Python package that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization, handling failures, command line integration, and much more.Some key features of Luigi:Built on top of Python, so it is easy to integrate into your existing Python workflows...
Metaflow is an open-source Python library that helps data scientists build and manage real-life data science projects. It provides an easy-to-use abstraction layer for data scientists to develop robust and reproducible pipelines, track experiments, visualize results, and deploy machine learning models to production.Some key features of Metaflow include:Simplified pipeline construction...
Ctfreak is an open-source CTF (Capture The Flag) platform designed specifically for hosting cybersecurity competitions and challenges. It provides all the necessary features and tools to create an engaging CTF event.With Ctfreak, users can create various categories and types of challenges including reverse engineering, web exploitation, cryptography, forensics, binary exploitation,...
Apache Oozie is an open source workflow scheduler system to manage Hadoop jobs. It is designed to run workflow jobs which represent a directed acyclic graph (DAG) of actions. Oozie workflows are written in hPDL (a XML Process Definition Language) and runs job instances based on the workflow definitions.Key capabilities...
StackStorm is an open-source event-driven automation platform for auto-remediation, security responses, troubleshooting, and more. It provides integration with common infrastructure components and easy ways to trigger automated workflows based on system events. Key features include:Flexible workflow engine based on automation actions to trigger responses and remediationsIntegration with monitoring tools, infrastructure,...
Shipyard is an open source data orchestration and workflow automation platform designed to help teams easily build, schedule, orchestrate and monitor pipelines. It provides an intuitive graphical interface to visualize your data pipelines and comes with over 300 pre-built components and templates.Key capabilities and benefits:Graphical pipeline designer to visually create...