Linux System Monitoring
Last updated
Last updated
Monitoring physical servers and virtual machines running the Linux Operating System (OS) involves more than monitoring the availability and performance metrics of the kernel such as CPU, memory, and disk utilization. Linux is basically made up of three parts which include the kernel, the shell, and the programs. To ensure a Linux system is at its optimal performance, it is crucial to also monitor the programs and applications that run in it, the instances of an executing program, also known as processes, and the login activities.
After a Linux system has been successfully registered in IT-Conductor for monitoring, you will be able to view the Availability, Available Memory, CPU, and Disk or Filesystem utilizations from the Service Grid.
Monitoring the availability of Linux Systems measures server or VM uptime over a specified period of time. In IT-Conductor, you can easily see the availability of a system by looking at the System Grid (See Figure 1). Availability is showing as GREEN which indicates that the system is available. Otherwise, it will change to 0.00% with a severity icon at the bottom.
Note: When a system is not available, all other metrics like Heartbeat, CPU I/O Wait, CPU Idle, CPU user, Load Average, Memory used, Memory Free, etc. will not show any data.
You can also utilize the Availability Chart to see the historical availability data per system. This is helpful in scenarios where you want to investigate issues related to system availability at a certain point in time.
Monitoring the performance of a Linux system starts with determining whether the Central Processing Unit (CPU) is at its optimal state. CPU utilization indicates how much processing power is being utilized. The higher the CPU utilization, the more work is being done by the system and the greater the potential for system instability which is why it's important to track CPU usage.
By default, the following metrics are available in the System Grid upon registering a Linux system in IT-Conductor for monitoring.
CPU I/O Wait
CPU Idle
CPU User
CPU User Peak
Load Average
Load Avg/CPU
You have the option to see the historical view of each of these metrics in IT-Conductor as seen below.
CPU I/O Wait is the idle time during which a system is processing I/O requests. It is usually represented as a percentage of the time it takes the CPU to wait while processing requests.
CPU Idle, on the other hand, is the time during which the system is not processing anything.
CPU User measures the time it takes the CPU to process code or programs running in the user space.
CPU User Peak measures the maximum utilization recorded at a specific point in time.
Load Average, as opposed to CPU User, measures the utilization of the processor in the kernel space.
Load Avg/CPU is just a dimensionless representation of the Load Average where 0 indicates the lowest utilization and 1 is the highest utilization.
Memory utilization indicates how much memory is being utilized by all the running services and applications in a system.
Free Memory indicates the available memory in a system.
Memory Used indicates the committed memory in use.
File systems are also monitored by IT-Conductor. The following are the metrics being monitored by default:
Allocated Size - maximum space allocated per directory
Available Space - available space left for use
Mount Status - indicates that a file system is ready for use
Used % - amount of disk space being used represented in percentage (%)
Note: You also have the option to enable Process Usage, a metric that measures the utilization of certain processes. Once configured, they run periodically in the background and generate reports for each system you have enabled it for.
Hardware (HW) Inventory Report displays hardware information about the monitored Linux system. Typically, this information doesn't change that much but it may come in handy during troubleshooting or when you want to verify hardware details about a specific system.
Most retrievers are disabled by default. If you want to enable and configure a specific use case, kindly raise a support request.