Advanced SAP Batch Job Monitoring

Monitoring job failures such as runtime errors and delays that exceed a certain threshold are part of the SAP Batch Job Monitoring. By setting a threshold, an alert message can be generated when the set limits are exceeded, which usually indicates a performance issue that needs to be investigated.

In IT-Conductor, job failures, runtimes, and delays that exceed pre-defined thresholds can be monitored.

Setting Threshold for Job Runtime

Runtime is a period of time when a program is running. An alert may be generated when a problem or timeout occurs while the program is running, and metrics to alert long runtime jobs can be set in the threshold.

  1. In the IT-Conductor dashboard, active alerts can be seen in the Alert Panel. Here, you can identify the alert that was caused by a runtime error. By clicking on the cause section, a chart will pop up with the details of the runtime that is currently tracking inflight time for jobs.

  1. Click on the data point to get the list of jobs captured during the selected interval and get details such as user, inflight time, and job name from each job.

  1. Returning to the chart, click on the Threshold Override icon to see the alerts captured by the threshold.

  1. Click on an alert to get information such as Warning Value, Warning Severity, and Alarm Severity from each job, all these can be set to your standard metrics. You can also set when the alert will be triggered.

Monitor Specific SAP Batch Job Performance

Retrievers

Retrievers deliver all information about a monitored system. Every component or application has a dedicated retriever that can help ensure that each application's state is reported accurately.

  1. In the IT-Conductor dashboard, navigate to SAP System ID → Retrievers.

Failed Batch Jobs

Most common alerts are caused by failed jobs. Looking for these alerts is one of the basic requirements when monitoring systems.

1. In the IT-Conductor dashboard, navigate to SAP System ID → Background Jobs → Failed.

2. Click on the graph on the chart to see the list of failed batch jobs. You can also click on any of the job names to get more details about the alert.

In-Flight Times

In-flight time is the time when the background jobs are running.

1. In the IT-Conductor dashboard, navigate to SAP System ID → Background Jobs → Runtime.

In any of the data points you click, it captures the maximum runtime of the jobs during the interval. You can set the frequency on how many jobs would trigger the alert or set the threshold based on severity.

Set Threshold for Delay

A delay means that something is preventing a scheduled background job from starting, impacting the job's overall performance. This can be caused by several reasons (for example, there are not enough dialog processes for the scheduler). All reasons can be captured in IT-Conductor, and a threshold can be created to set metrics for alerts.

1. In the IT-Conductor dashboard, navigate to SAP System ID → Background Jobs → Delay Time.

Click the chart title, then click the Threshold Override icon to see the available job names.

  1. You can also set the override threshold by clicking on the job name. This override allows tracking when the system is busy or when jobs are scheduled but not able to run on time.

Monitor Job Performance in Application Server

Jobs run on particular servers. Checking the performance metrics on these servers gives you an idea of why the job takes much longer than usual. This can also show you what resources are accumulated, as this can impact other shared resources in the system.

Monitor Overall Health

Service Health Monitoring provides all the information about the system's overall health. In the service grid, click on Health. This will show the monitoring components of the system.

Expanding the components will give visibility to the alert warning symbols indicating if there are specific alerts in the current state.

In this option, all the graphs and details can be seen simultaneously synchronizing with the time of the other performance criteria.

This is beneficial in doing analytics to figure out where the bottlenecks are in terms of workload.

Video

Last updated