Background

Chat clients are now integral part of every organization and there is scope to integrate most of the existing systems with such chat clients. Today we are striving towards more efficient infrastructure provisioning and maintenance. Leveraging Always available chatbots that perform repetitive tasks and give relevant information is a unique way to share the burden of infrastructure monitoring.

With help of chat bots we can integrate tools to the ChatOps platform for infrastructure Monitoring, Alerting and incident remediation of Infrastructure Components. Furthermore expand this to include self-remediation of infrastructure components and AutoDetect points of failure and take necessary action.

Our Infrastructure

Basic Architecture

upload successful
In our basic architecture we have two Hosts named Alpha and Beta. While Beta hosts the monitoring server (Nagios) which monitors the Alpha host. On Alpha host we have our chatbot and apache service.  

Nagios is monitoring the Alpha Host health and the apache service running on the host. Nagios Host/Service Notifications are sent to the slack channel.

Our chatbot is in the same channel and it performs actions based on the notifications.

Nagios

A prominent open source monitoring tool for IT infrastructure monitoring. Nagios monitors the hosts, services and generates alerts.

Slack

A popular chat application used by businesses worldwide. We will be integrating slack with hubot and nagios.

Hubot

A chat bot from GitHub. If integrated with Slack and added to a channel it listens to conversations on the channel. Hubot can extend further to perform actions on matching to specific chat terms.

upload successful

Configuring Nagios Monitoring Server

Nagios Monitoring server will generate alerts on service and host failures. The integration between Nagios and Slack utilizes the Nagios alerting feature.

Nagios Contact
upload successful

Nagios Service
upload successful

Post integration nagios will generate alerts and send notifications on slack

Nagios Notifications in slack
upload successful

Hubot Script

A simple script to perform action on a specific string pattern

upload successful

Example

You can see below as soon as nagios sends an alert hubot performs actions to bring the service back up.

upload successful

Further Scope

Furthermore we can add Information retrieval to proactively figure out patterns of failure of infrastructure components.

One implementation of such a chatbot is to automate the build and deployment process.