Ask or search…
Comment on page

OS Linux Pacemaker Cluster Error Management

Operating System (OS) clusters provide high availability and fault tolerance to critical applications and services in a distributed system. Among the various cluster management software available, Pacemaker stands out as a reliable and versatile tool for Linux-based systems.
IT-Conductor allows users to automate error handling in a Pacemaker cluster environment. It can be applied to one or more systems and can be run manually or on a schedule. This feature is highly adaptable to any customer environment and can become an essential component of IT maintenance operations.

Pre-requisite Requirements

  • The system(s) should be registered in IT-Conductor for monitoring.
Figure 1: Linux System in IT-Conductor Service Grid
  • A Robot User should be created and associated with the application/DB/OS users with assigned roles/privileges to execute the local action on the system to be stopped/started.
    Figure 2: Start/Stop Process Definitions
  • The ownership of the process definition should be assigned to the Robot User.
  • The Robot User should be able to view the following recovery definitions when navigating to Management > Automation > Recover Definitions.
    • Pacemaker: pcs resource failcount reset
Figure 3: Navigating to Recovery Definitions
Figure 4: List of Recovery Definitions
Note: If the recovery definitions are not available for use, contact IT-Conductor Support.

How to Automate OS Linux Pacemaker Cluster Error Management

  1. 1.
    Select the Linux system to implement the automation and click Pacemaker Log.
Figure 5: Pacemaker Log in IT-Conductor Service Grid
  1. 2.
    Click the "Threshold Overrides" icon.
Figure 6: Pacemaker Log Chart in IT-Conductor
  1. 3.
    Select the targeted override.
Figure 7: Pacemaker Log Overrides
Important: Choose the override with the maintenance mode not enabled.
  1. 4.
    Configure the threshold and define the desired schedule for running the automation.
Figure 8: Pacemaker Log Monitoring Configuratio Settings
  1. 5.
    Select the desired recovery action in the "Recovery" dropdown menu.
Figure 9: Selecting Recovery Action
Note: If the recovery definitions are not available for use, contact IT-Conductor Support.
  1. 6.
    Configure to send a notification for this event. (Optional)
Figure 10: Configuring Notification
  1. 7.
    Click Save to complete the automation.
Figure 11: Saving Configurations
  1. 8.
    You can see the Automatic Cleaning Process of the OS Linux Pacemaker Cluster Error from the following view:
Figure 12: Automatic Cleaning of the OS Linux Pacemaker Cluster Error