OS Linux Pacemaker Cluster Error Management
Last updated
Last updated
Operating System (OS) clusters provide high availability and fault tolerance to critical applications and services in a distributed system. Among the various cluster management software available, Pacemaker stands out as a reliable and versatile tool for Linux-based systems.
IT-Conductor allows users to automate error handling in a Pacemaker cluster environment. It can be applied to one or more systems and can be run manually or on a schedule. This feature is highly adaptable to any customer environment and can become an essential component of IT maintenance operations.
The system(s) should be registered in IT-Conductor for monitoring.
A Robot User should be created and associated with the application/DB/OS users with assigned roles/privileges to execute the local action on the system to be stopped/started.
The ownership of the process definition should be assigned to the Robot User.
The Robot User should be able to view the following recovery definitions when navigating to Management → Automation → Recover Definitions.
Pacemaker: pcs resource failcount reset
Note: If the recovery definitions are not available for use, contact IT-Conductor Support.
Select the Linux system to implement the automation and click Pacemaker Log.
Click the Threshold Overrides icon.
Select the targeted override.
Important: Choose the override with the maintenance mode not enabled.
Configure the threshold and define the desired schedule for running the automation.
Select the desired recovery action in the Recovery dropdown menu.
Note: If the recovery definitions are not available for use, contact IT-Conductor Support.
Configure to send a notification for this event. (Optional)
Click Save to complete the automation.
You can see the Automatic Cleaning Process of the OS Linux Pacemaker Cluster Error from the following view: