OS Linux Pacemaker Cluster Error Management

Operating System (OS) clusters provide high availability and fault tolerance to critical applications and services in a distributed system. Among the various cluster management software available, Pacemaker stands out as a reliable and versatile tool for Linux-based systems.

IT-Conductor allows users to automate error handling in a Pacemaker cluster environment. It can be applied to one or more systems and can be run manually or on a schedule. This feature is highly adaptable to any customer environment and can become an essential component of IT maintenance operations.

Prerequisites

  • The system(s) should be registered in IT-Conductor for monitoring.

Figure 1: Linux System in IT-Conductor Service Grid
  • A Robot User should be created and associated with the application/DB/OS users with assigned roles/privileges to execute the local action on the system to be stopped/started.

    Figure 2: Start/Stop Process Definitions
  • The ownership of the process definition should be assigned to the Robot User.

  • The Robot User should be able to view the following recovery definitions when navigating to Management → Automation → Recover Definitions.

    • Pacemaker: pcs resource failcount reset

Figure 3: Navigating to Recovery Definitions
Figure 4: List of Recovery Definitions

Note: If the recovery definitions are not available for use, contact IT-Conductor Support.

Automate OS Linux Pacemaker Cluster Error Management

  1. Select the Linux system to implement the automation and click Pacemaker Log.

Figure 5: Pacemaker Log in IT-Conductor Service Grid
  1. Click the Threshold Overrides icon.

Figure 6: Pacemaker Log Chart in IT-Conductor
  1. Select the targeted override.

Figure 7: Pacemaker Log Overrides

Important: Choose the override with the maintenance mode not enabled.

  1. Configure the threshold and define the desired schedule for running the automation.

Figure 8: Pacemaker Log Monitoring Configuratio Settings
  1. Select the desired recovery action in the Recovery dropdown menu.

Figure 9: Selecting Recovery Action

Note: If the recovery definitions are not available for use, contact IT-Conductor Support.

  1. Configure to send a notification for this event. (Optional)

Figure 10: Configuring Notification
  1. Click Save to complete the automation.

Figure 11: Saving Configurations
  1. You can see the Automatic Cleaning Process of the OS Linux Pacemaker Cluster Error from the following view:

Figure 12: Automatic Cleaning of the OS Linux Pacemaker Cluster Error

Last updated