Automating Hadoop Cluster Setup Using Ansible

What is Ansible?

Ansible is an open-source automation tool, or platform, used for IT tasks such as configuration management, application deployment, intraservice orchestration, and provisioning. Automation is crucial these days, with IT environments that are too complex and often need to scale too quickly for system administrators and developers to keep up if they had to do everything manually. Automation simplifies complex tasks, not just making developers’ jobs more manageable but allowing them to focus attention on other tasks that add value to an organization.

What is Hadoop ?

The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware.

Ansible Architecture :

Ansible Playbooks :

Ordered lists of tasks, saved so you can run those tasks in that order repeatedly. Playbooks are written in YAML format.

Inventory :

A list of managed nodes. An inventory file is also sometimes called a hostfile. Your inventory can specify information like IP address for each managed node.

Control Node:

Any machine with Ansible installed is known as controller node. You can run Ansible commands and playbooks by invoking the ansible or ansible-playbook command from any control node.

Managed Node:

The devices you manage with Ansible. Managed nodes are also sometimes called hosts.

Now lets start setup of Hadoop Cluster using Ansible.

Steps :

Install ansible in controller mode

pip3 install ansible
yum install sshpass#To see version of ansible installed
ansible --version

Ansible Configuration File

vim /etc/ansible/ansible.conf

To check connectivity with all Managed Nodes

ansible all -m ping

PlayBook :

PlayBook for Configration of NameNode

  • Running NameNode Playbook
ansible-playbook namenode.yml
  • Checking namenode services has been started in target node

PlayBook for Configration of DataNode

  • Running DataNode Playbook

(Due to low RAM and CPU resources I am making DataNode same VM that I used for NameNode. For that just stop the NameNode Services and delete the NameNode folder.)

ansible-playbook datanode.yml

PlayBook for Configration of Client

Link to Code :

I’m an undergraduate student at IIIT Ranchi, pursuing my B-Tech in Electronics and Communication Engineering.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store