Linux System Monitoring
Monitoring physical servers and virtual machines running the Linux Operating System (OS) involves more than monitoring the availability and performance metrics of the kernel such as CPU, memory, and disk utilization. Linux is basically made up of three parts which include the kernel, the shell, and the programs. To ensure a Linux system is at its optimal performance, it is crucial to also monitor the programs and applications that run in it, the instances of an executing program, also known as processes, and the login activities.
After a Linux system has been successfully registered in IT-Conductor for monitoring, you will be able to view the Availability, Available Memory, CPU, and Disk or Filesystem utilizations from the Service Grid.
Figure 1: Linux Systems View in IT-Conductor Service Grid
Monitoring the availability of Linux Systems measures server or VM uptime over a specified period of time. In IT-Conductor, you can easily see the availability of a system by looking at the System Grid (See Figure 1). Availability is showing as GREEN which indicates that the system is available. Otherwise, it will change to 0.00% with a severity icon at the bottom.
Note: When a system is not available, all other metrics like Heartbeat, CPU I/O Wait, CPU Idle, CPU user, Load Average, Memory used, Memory Free, etc. will not show any data.
You can also utilize the Availability Chart to see the historical availability data per system. This is helpful in scenarios where you want to investigate issues related to system availability at a certain point in time.
Figure 2: Linux System Availability Chart
Monitoring the performance of a Linux system starts with determining whether the Central Processing Unit (CPU) is at its optimal state. CPU utilization indicates how much processing power is being utilized. The higher the CPU utilization, the more work is being done by the system and the greater the potential for system instability which is why it's important to track CPU usage.
By default, the following metrics are available in the System Grid upon registering a Linux system in IT-Conductor for monitoring.
- CPU I/O Wait
- CPU Idle
- CPU User
- CPU User Peak
- Load Average
- Load Avg/CPU
You have the option to see the historical view of each of these metrics in IT-Conductor as seen below.
CPU I/O Wait is the idle time during which a system is processing I/O requests. It is usually represented as a percentage of the time it takes the CPU to wait while processing requests.
Figure 3: CPU I/O Wait
CPU Idle, on the other hand, is the time during which the system is not processing anything.
Figure 4: CPU Idle
CPU User measures the time it takes the CPU to process code or programs running in the user space.
Figure 5: CPU User
CPU User Peak measures the maximum utilization recorded at a specific point in time.
Figure 6: CPU User Peak
Load Average, as opposed to CPU User, measures the utilization of the processor in the kernel space.
Figure 7: Load Average
Load Avg/CPU is just a dimensionless representation of the Load Average where 0 indicates the lowest utilization and 1 is the highest utilization.
Figure 8: Load Average per CPU
Memory utilization indicates how much memory is being utilized by all the running services and applications in a system.
Free Memory indicates the available memory in a system.
Figure 9: Free Memory
Memory Used indicates the committed memory in use.
Figure 10: Memory Used
File systems are also monitored by IT-Conductor. The following are the metrics being monitored by default:
- Allocated Size - maximum space allocated per directory
- Available Space - available space left for use
- Mount Status - indicates that a file system is ready for use
- Used % - amount of disk space being used represented in percentage (%)
Figure 11: File Systems Utilization
Note: You also have the option to enable Process Usage, a metric that measures the utilization of certain processes. Once configured, they run periodically in the background and generate reports for each system you have enabled it for.
Hardware (HW) Inventory Report displays hardware information about the monitored Linux system. Typically, this information doesn't change that much but it may come in handy during troubleshooting or when you want to verify hardware details about a specific system.
Figure 12: HW Inventory Report
Most retrievers are disabled by default. If you want to enable and configure a specific use case, kindly raise a support request.