Unix/Linux System Monitoring

Unix and Linux are well-known operating systems (OS) for their versatility and robustness in various computing environments. Unix/Linux OS has three main components: kernel, shell, and programs. The kernel is the core component that manages system resources and provides a platform for software applications to interact with the hardware.

To monitor the availability and performance of Unix/Linux systems, critical kernel metrics such as CPU, memory, and disk utilization must be monitored. It is also recommended that the programs and applications running inside the system, the processes executing these programs, and user login activities be monitored to ensure peak performance.

Configure Unix/Linux System Monitoring in IT-Conductor

To configure Unix/Linux system monitoring in IT-Conductor, follow the instructions below.

Account Requirements

  1. Create a user that can execute df, fstab, vmstat, mpstat, lpstat, free, uptime, and ps commands at a minimum.

  2. Add the user created as a member of the system group.

Note:

  • Perform these steps on the server where you installed the gateway. See Gateway Setup for more details.

  • Access to additional commands may be required if a custom configuration is involved.

Add New Unix/Linux System

  1. Visit service.itconductor.com and enter your login credentials.

  2. Navigate to Dashboards → Administrator to access the Administrator's Dashboard.

  3. Locate the Unix/Linux Systems actions panel and click the title to access the complete list.

  1. Click the New Linux System button to start adding a new system for monitoring.

  • Description - refers to any relevant information about the system being added.

  • Organization - refers to an administrative structure that defines objects with a common goal or purpose. If you previously created an organization, please select it.

  • Role - refers to the environment where the system will be used.

  • Site - refers to a logical object that describes a particular area or location, depending on the context in which it is used.

  • Gateway - allows communication between the customer's site network and the IT-Conductor cloud platform. Select the previously configured gateway from the dropdown menu. See Gateway Setup for more details.

  • Host - refers to the host of the system being added.

  • Port Number - refers to the port number (SSH Port 22 by default) that will be used to access the system being added.

  • Shell - refers to the command-line interface of the OS used by the system being added.

  • Sudo Arguments - refer to the additional parameters or options passed to the sudo command.

  • Description - refers to any relevant information about the user account being added.

  • Application - refers to the name given to the system being added.

  • Person - refers to the name of the user who is adding the system.

  • Realm - refers to a domain or administrative boundary within a network environment. If the server supports Kerberos authentication, input the domain name.

  • User Name - refers to the Linux user created aligned with the Account Requirements.

  • Password - refers to the password of the previously created Linux user.

  • Retype Password - refers to the same password provided in the Password field.

  • Private Key File - refers to the file containing cryptographic keys used for authentication, encryption, and decryption in secure communication protocols such as SSH and SSL/TSL. A path to the "gateway-local .ppk" file if the key authentication is required.

  1. Verify if the system was added to the Unix/Linux Systems actions panel and check its status.

  1. Navigate to the service grid and verify if the system was added under the Linux Systems node.

Note: The system will appear in the service grid within 5-15 minutes.

Monitor Unix/Linux System in IT-Conductor

To view the availability and performance metrics of a Unix/Linux system, locate the Linux Systems node in the service grid.

Linux System Key Metrics

  • Availability - refers to the operational state and accessibility of the Unix/Linux system.

  • Connection Failures - refer to the events where attempts to establish a connection are unsuccessful.

  • Heartbeat - refers to the periodic signal sent to the system, enabling real-time detection of system downtime.

  • Missing Account - refers to the absence of a required user account or resource within the Unix/Linux system.

  • Retriever Failures - refer to the errors or issues encountered during the retrieval process of essential data or information within the Unix/Linux system.

  • CPU I/O Wait - refers to the percentage of CPU spent waiting for Input/Output (I/O) operations to complete.

  • CPU Idle - refers to the percentage of idle CPU time or not executing any tasks.

  • CPU System - refers to the percentage of CPU time spent executing kernel-level processes and system calls.

  • CPU User Peak - refers to the highest level of CPU usage by user-space processes within a certain period.

  • CPU User Time - refers to the percentage of CPU time spent executing user-space processes.

  • Load Average refers to the average number of processes running or uninterruptible over a defined period.

  • Load Avg/CPU - refers to the ratio of the load average to the number of CPU cores in a system.

  • Free Memory - refers to the amount of physical RAM currently not used by any active processes or cached by the system.

  • Memory Used - refers to the percentage of memory currently utilized by active processes, cached data, and system buffers.

  • Memory Used Peak - refers to the highest percentage of memory utilization observed within a certain period.

  • Swap Used - refers to the percentage of swap space currently utilized by the system.

Note: All metrics will not show data when a system is unavailable.

File Systems Utilization Metrics

File systems are also part of the default monitoring setup upon adding a Unix/Linux system in IT-Conductor. The following metrics are automatically tracked for file systems:

  • Allocated Size - refers to the amount of disk space allocated per directory.

  • Available Space - refers to the amount of available or unused disk space.

  • Mount Status - refers to the current state of a file system. It indicates whether the file system is successfully mounted and accessible for read and write operations.

  • Used % - refers to the percentage of disk space used per directory.

Note: You can also enable Process Usage, a metric that measures the utilization of specific processes. Once enabled, it operates at regular intervals, running in the background to produce reports for each system where it's enabled.

Health Explorer

To view a more detailed analysis of metrics and time-synchronized data, click Health in the service grid, and you will be redirected to the Health Explorer page.

Scheduled Maintenance Events

To view the scheduled maintenance events for monitored Unix/Linux systems, click Events in the service grid, and a pop-up list of Scheduled Maintenance Events will be displayed.

Heat Map

To view a heat map for monitored Unix/Linux systems, click Heat Map in the service grid, and a pop-up heat map will be displayed. Click on a tile to show a more detailed view of a system's health.

Performance Overview

To view a high-level comparison of system health for monitored Unix/Linux systems, click Performance Overview in the service grid. You will be redirected to the Performance Overview page.

Click the checkboxes to exclude the system(s) from view, depending on what you want to investigate.

Last updated