# OS Linux Pacemaker Cluster Error Management

Operating System (OS) clusters provide high availability and fault tolerance to critical applications and services in a distributed system. Among the various cluster management software available, Pacemaker stands out as a reliable and versatile tool for Linux-based systems.

IT-Conductor allows users to automate error handling in a Pacemaker cluster environment. It can be applied to one or more systems and can be run manually or on a schedule. This feature is highly adaptable to any customer environment and can become an essential component of IT maintenance operations.

### Prerequisites

* The system(s) should be registered in IT-Conductor for monitoring.

<figure><img src="https://377464071-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FXhp08OmU8050PePmMgDt%2Fuploads%2FUTarptBkwE3dgGM7bN9C%2FLinux%20System%20in%20IT-Conductor%20Service%20Grid.png?alt=media&#x26;token=1c6f0b66-5723-4554-9a63-081907c89e04" alt=""><figcaption><p>Figure 1: Linux System in IT-Conductor Service Grid</p></figcaption></figure>

* A [Robot User](https://docs.itconductor.com/user-guide/account-administration/create-robot-users) should be created and associated with the application/DB/OS users with assigned roles/privileges to execute the local action on the system to be stopped/started.

  <figure><img src="https://377464071-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FXhp08OmU8050PePmMgDt%2Fuploads%2FEmOPTpaMQJaZN1HUMytC%2Fimage.png?alt=media&#x26;token=c57683e2-c953-45d9-b13c-e8dda4df1d81" alt=""><figcaption><p>Figure 2: Start/Stop Process Definitions</p></figcaption></figure>
* The ownership of the process definition should be assigned to the [Robot User](https://docs.itconductor.com/user-guide/account-administration/create-robot-users).
* The [Robot User](https://docs.itconductor.com/user-guide/account-administration/create-robot-users) should be able to view the following recovery definitions when navigating to **Management → Automation → Recover Definitions**.
  * Pacemaker: pcs resource failcount reset

<figure><img src="https://377464071-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FXhp08OmU8050PePmMgDt%2Fuploads%2FH2E1atpHsQndHqzCSkbn%2FNavigating%20to%20Recovery%20Definitions.png?alt=media&#x26;token=ca49c572-1baa-458e-ad9a-74f870b5d316" alt=""><figcaption><p>Figure 3: Navigating to Recovery Definitions</p></figcaption></figure>

<figure><img src="https://377464071-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FXhp08OmU8050PePmMgDt%2Fuploads%2Fw3kiR8AVFTNITfy2RsnT%2FList%20of%20Recovery%20Definitions.png?alt=media&#x26;token=f66e865c-4630-4b5b-be95-bc97d88adc9e" alt=""><figcaption><p>Figure 4: List of Recovery Definitions</p></figcaption></figure>

{% hint style="info" %}
**Note:** If the recovery definitions are not available for use, contact [IT-Conductor Support](https://docs.itconductor.com/references/support).
{% endhint %}

### Automate OS Linux Pacemaker Cluster Error Management

1. Select the Linux system to implement the automation and click **Pacemaker Log**.

<figure><img src="https://377464071-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FXhp08OmU8050PePmMgDt%2Fuploads%2FmL2NrbVPLEujgYwEJOld%2FSelecting%20Pacemaker%20Log.png?alt=media&#x26;token=cdf49ce6-b1b1-497b-8583-efc534f92235" alt=""><figcaption><p>Figure 5: Pacemaker Log in IT-Conductor Service Grid</p></figcaption></figure>

2. Click the **Threshold Overrides** icon.

<figure><img src="https://377464071-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FXhp08OmU8050PePmMgDt%2Fuploads%2Fy77PWh2ZZVvGMGyEOLoJ%2FPacemaker%20Log%20Chart%20in%20IT-Conductor.png?alt=media&#x26;token=9d3d04b8-ed2f-4b5c-b254-04a0383d2468" alt=""><figcaption><p>Figure 6: Pacemaker Log Chart in IT-Conductor</p></figcaption></figure>

3. Select the targeted override.

<figure><img src="https://377464071-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FXhp08OmU8050PePmMgDt%2Fuploads%2FofZYwmCJ53AYRij1dBda%2FPacemaker%20Log%20Overrides.png?alt=media&#x26;token=1b5c4708-f56d-4620-9cbd-780fe589ffb3" alt=""><figcaption><p>Figure 7: Pacemaker Log Overrides</p></figcaption></figure>

{% hint style="info" %}
**Important:** Choose the override with the maintenance mode not enabled.
{% endhint %}

4. Configure the threshold and define the desired schedule for running the automation.

<figure><img src="https://377464071-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FXhp08OmU8050PePmMgDt%2Fuploads%2FtKvBSOHd49aeVFET3935%2FPacemaker%20Log%20Monitoring%20Configuration%20Settings.png?alt=media&#x26;token=97df0207-3f7b-4789-aeb3-7436940bdffc" alt=""><figcaption><p>Figure 8: Pacemaker Log Monitoring Configuratio Settings</p></figcaption></figure>

5. Select the desired recovery action in the **Recovery** dropdown menu.

<figure><img src="https://377464071-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FXhp08OmU8050PePmMgDt%2Fuploads%2F12MxnwiBImW8iGFthGnl%2FSelecting%20Recovery%20Action.png?alt=media&#x26;token=92c0651c-4a45-437c-9c98-fbbb0ba389fa" alt=""><figcaption><p>Figure 9: Selecting Recovery Action</p></figcaption></figure>

{% hint style="info" %}
**Note:** If the recovery definitions are not available for use, contact [IT-Conductor Support](https://docs.itconductor.com/references/support).
{% endhint %}

6. Configure to send a notification for this event. (Optional)

<figure><img src="https://377464071-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FXhp08OmU8050PePmMgDt%2Fuploads%2FsePGEXk5DTg5mEqznnZb%2FConfiguring%20Notification.png?alt=media&#x26;token=e4534ecb-a311-4908-af80-b27ba756ef72" alt=""><figcaption><p>Figure 10: Configuring Notification</p></figcaption></figure>

7. Click **Save** to complete the automation.

<figure><img src="https://377464071-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FXhp08OmU8050PePmMgDt%2Fuploads%2FW0CKQl986AtY29lPB8pN%2FSaving%20the%20Configurations.png?alt=media&#x26;token=feacd065-68fa-4c38-b655-16db9748bd42" alt=""><figcaption><p>Figure 11: Saving Configurations</p></figcaption></figure>

8. You can see the Automatic Cleaning Process of the OS Linux Pacemaker Cluster Error from the following view:

<figure><img src="https://377464071-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FXhp08OmU8050PePmMgDt%2Fuploads%2FTULVuN71SvsuihLOaGgQ%2FOS%20Linux%20Pacemaker%20Cluster%20Error%20Automatic%20Cleaning.png?alt=media&#x26;token=fef7b2a8-6b72-4fac-95bf-6c723c66cc15" alt=""><figcaption><p>Figure 12: Automatic Cleaning of the OS Linux Pacemaker Cluster Error</p></figcaption></figure>

### Related Information <a href="#related-articles" id="related-articles"></a>

* [Clear Failed Fencing Actions Messages](https://www.suse.com/es-es/support/kb/doc/?id=000019463)
* [Failed cluster actions in crm\_mon](https://www.suse.com/support/kb/doc/?id=000018057)
* [How can I show and clear the fencing history in a Pacemaker cluster?](https://access.redhat.com/solutions/3761361)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.itconductor.com/user-guide/automation/os-linux-pacemaker-cluster-error-management.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
