# OS Linux Pacemaker Cluster Error Management

Operating System (OS) clusters provide high availability and fault tolerance to critical applications and services in a distributed system. Among the various cluster management software available, Pacemaker stands out as a reliable and versatile tool for Linux-based systems.

IT-Conductor allows users to automate error handling in a Pacemaker cluster environment. It can be applied to one or more systems and can be run manually or on a schedule. This feature is highly adaptable to any customer environment and can become an essential component of IT maintenance operations.

### Prerequisites

* The system(s) should be registered in IT-Conductor for monitoring.

<figure><img src="https://377464071-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FXhp08OmU8050PePmMgDt%2Fuploads%2FUTarptBkwE3dgGM7bN9C%2FLinux%20System%20in%20IT-Conductor%20Service%20Grid.png?alt=media&#x26;token=1c6f0b66-5723-4554-9a63-081907c89e04" alt=""><figcaption><p>Figure 1: Linux System in IT-Conductor Service Grid</p></figcaption></figure>

* A [Robot User](https://docs.itconductor.com/user-guide/account-administration/create-robot-users) should be created and associated with the application/DB/OS users with assigned roles/privileges to execute the local action on the system to be stopped/started.

  <figure><img src="https://377464071-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FXhp08OmU8050PePmMgDt%2Fuploads%2FEmOPTpaMQJaZN1HUMytC%2Fimage.png?alt=media&#x26;token=c57683e2-c953-45d9-b13c-e8dda4df1d81" alt=""><figcaption><p>Figure 2: Start/Stop Process Definitions</p></figcaption></figure>
* The ownership of the process definition should be assigned to the [Robot User](https://docs.itconductor.com/user-guide/account-administration/create-robot-users).
* The [Robot User](https://docs.itconductor.com/user-guide/account-administration/create-robot-users) should be able to view the following recovery definitions when navigating to **Management → Automation → Recover Definitions**.
  * Pacemaker: pcs resource failcount reset

<figure><img src="https://377464071-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FXhp08OmU8050PePmMgDt%2Fuploads%2FH2E1atpHsQndHqzCSkbn%2FNavigating%20to%20Recovery%20Definitions.png?alt=media&#x26;token=ca49c572-1baa-458e-ad9a-74f870b5d316" alt=""><figcaption><p>Figure 3: Navigating to Recovery Definitions</p></figcaption></figure>

<figure><img src="https://377464071-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FXhp08OmU8050PePmMgDt%2Fuploads%2Fw3kiR8AVFTNITfy2RsnT%2FList%20of%20Recovery%20Definitions.png?alt=media&#x26;token=f66e865c-4630-4b5b-be95-bc97d88adc9e" alt=""><figcaption><p>Figure 4: List of Recovery Definitions</p></figcaption></figure>

{% hint style="info" %}
**Note:** If the recovery definitions are not available for use, contact [IT-Conductor Support](https://docs.itconductor.com/references/support).
{% endhint %}

### Automate OS Linux Pacemaker Cluster Error Management

1. Select the Linux system to implement the automation and click **Pacemaker Log**.

<figure><img src="https://377464071-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FXhp08OmU8050PePmMgDt%2Fuploads%2FmL2NrbVPLEujgYwEJOld%2FSelecting%20Pacemaker%20Log.png?alt=media&#x26;token=cdf49ce6-b1b1-497b-8583-efc534f92235" alt=""><figcaption><p>Figure 5: Pacemaker Log in IT-Conductor Service Grid</p></figcaption></figure>

2. Click the **Threshold Overrides** icon.

<figure><img src="https://377464071-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FXhp08OmU8050PePmMgDt%2Fuploads%2Fy77PWh2ZZVvGMGyEOLoJ%2FPacemaker%20Log%20Chart%20in%20IT-Conductor.png?alt=media&#x26;token=9d3d04b8-ed2f-4b5c-b254-04a0383d2468" alt=""><figcaption><p>Figure 6: Pacemaker Log Chart in IT-Conductor</p></figcaption></figure>

3. Select the targeted override.

<figure><img src="https://377464071-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FXhp08OmU8050PePmMgDt%2Fuploads%2FofZYwmCJ53AYRij1dBda%2FPacemaker%20Log%20Overrides.png?alt=media&#x26;token=1b5c4708-f56d-4620-9cbd-780fe589ffb3" alt=""><figcaption><p>Figure 7: Pacemaker Log Overrides</p></figcaption></figure>

{% hint style="info" %}
**Important:** Choose the override with the maintenance mode not enabled.
{% endhint %}

4. Configure the threshold and define the desired schedule for running the automation.

<figure><img src="https://377464071-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FXhp08OmU8050PePmMgDt%2Fuploads%2FtKvBSOHd49aeVFET3935%2FPacemaker%20Log%20Monitoring%20Configuration%20Settings.png?alt=media&#x26;token=97df0207-3f7b-4789-aeb3-7436940bdffc" alt=""><figcaption><p>Figure 8: Pacemaker Log Monitoring Configuratio Settings</p></figcaption></figure>

5. Select the desired recovery action in the **Recovery** dropdown menu.

<figure><img src="https://377464071-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FXhp08OmU8050PePmMgDt%2Fuploads%2F12MxnwiBImW8iGFthGnl%2FSelecting%20Recovery%20Action.png?alt=media&#x26;token=92c0651c-4a45-437c-9c98-fbbb0ba389fa" alt=""><figcaption><p>Figure 9: Selecting Recovery Action</p></figcaption></figure>

{% hint style="info" %}
**Note:** If the recovery definitions are not available for use, contact [IT-Conductor Support](https://docs.itconductor.com/references/support).
{% endhint %}

6. Configure to send a notification for this event. (Optional)

<figure><img src="https://377464071-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FXhp08OmU8050PePmMgDt%2Fuploads%2FsePGEXk5DTg5mEqznnZb%2FConfiguring%20Notification.png?alt=media&#x26;token=e4534ecb-a311-4908-af80-b27ba756ef72" alt=""><figcaption><p>Figure 10: Configuring Notification</p></figcaption></figure>

7. Click **Save** to complete the automation.

<figure><img src="https://377464071-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FXhp08OmU8050PePmMgDt%2Fuploads%2FW0CKQl986AtY29lPB8pN%2FSaving%20the%20Configurations.png?alt=media&#x26;token=feacd065-68fa-4c38-b655-16db9748bd42" alt=""><figcaption><p>Figure 11: Saving Configurations</p></figcaption></figure>

8. You can see the Automatic Cleaning Process of the OS Linux Pacemaker Cluster Error from the following view:

<figure><img src="https://377464071-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FXhp08OmU8050PePmMgDt%2Fuploads%2FTULVuN71SvsuihLOaGgQ%2FOS%20Linux%20Pacemaker%20Cluster%20Error%20Automatic%20Cleaning.png?alt=media&#x26;token=fef7b2a8-6b72-4fac-95bf-6c723c66cc15" alt=""><figcaption><p>Figure 12: Automatic Cleaning of the OS Linux Pacemaker Cluster Error</p></figcaption></figure>

### Related Information <a href="#related-articles" id="related-articles"></a>

* [Clear Failed Fencing Actions Messages](https://www.suse.com/es-es/support/kb/doc/?id=000019463)
* [Failed cluster actions in crm\_mon](https://www.suse.com/support/kb/doc/?id=000018057)
* [How can I show and clear the fencing history in a Pacemaker cluster?](https://access.redhat.com/solutions/3761361)
