Hadoop Automation – Scheduling, Securing and Connecting

Hadoop logoWhen exploring automation of Hadoop processes, it’s important to consider best practices for securing and tracking who runs them, when they may be submitted to the schedule and what resources they should be allowed to consume. Whether using Hadoop for Big Data Analytics, social network architecture or marketing data mining, the questions are the same:

Is it practical to train business users to SSH into one of your Hadoop nodes?

What are the risks of granting users the necessary rights to run processes on Hadoop nodes?

Centrally scheduling Hadoop processes and providing business users secure methods for ad-hoc submission enables organizations to leverage the technology while preventing new security risks and eliminating the need for new training.

Automation Through CRON and SSH

Hadoop processes can be run through a variety of mechanisms, either through CRON, or manually in a SSH session on the fly. What’s the best way to secure this? What if you could simply automate that process to run at the end of the week? Or even allow a user to submit a job within your scheduler, and be prompted with only the necessary inputs required for them to get the data they need?

In our latest release of JAMS, we’ve made Hadoop automation clear and efficient. By deploying a JAMS Agent onto any of your Hadoop nodes, you can immediately begin executing and automating any of your Hadoop processes. And, if you need to run the same process against a different dataset each day, you can leverage JAMS Parameters to prompt your users for discrete values in a simple fill-in-the-blank form, or through multiple dropdowns with a variety of options. There is no need for the user to worry about the required structure and syntax of the underlying Hadoop process. And, as an administrator, you don’t need to be concerned about users modifying the source of a job.

Automating Non-CLI Processes

What about Hadoop processes that are run through PHP, Perl or even Python, rather than simply calling Java applications from the command line? JAMS can natively execute any of these scripts, and add robust scheduling logic, such as custom calendars, triggers and dependencies. Whether you need to run a Map/Reduce process alone, or chain it together with other complex processes and kick off notifications by parsing the data, JAMS can handle all of your Hadoop needs from start to finish.